License: CC BY 4.0
arXiv:2402.17495v1 [cs.SI] 27 Feb 2024

The Unwanted Dissemination of Science: The Usage of Academic Articles as Ammunition in Contested Discursive Arenas on Twitter

Richard Zhang Northwestern UniversityUSA  and  Emőke-Ágnes Horvát Northwestern UniversityUSA
(2024)
Abstract.

Twitter is a common site of offensive language. Prior literature has shown that the emotional content of tweets can heavily impact their diffusion when discussing political topics. We extend prior work to look at offensive tweets that link to academic articles. Using a mixed methods approach, we identify three findings: firstly, offensive language is common in tweets that refer to academic articles, and vary widely by subject matter. Secondly, discourse analysis reveals that offensive tweets commonly use academic articles to promote or attack political ideologies. Lastly, we show that offensive tweets reach a smaller audience than their non-offensive counterparts. Our analysis of these offensive tweets reveal how academic articles are being shared on Twitter not for the sake of disseminating new knowledge, but rather to as argumentative tools in controversial and combative discourses.

social media, offensive language, politicization of science, dissemination of science, twitter
copyright: acmlicensedjournalyear: 2024doi: XXXXXXX.XXXXXXXconference: ; November 09–13, 2024; San José, Costa Ricaisbn: 978-1-4503-XXXX-X/18/06ccs: Human-centered computing Empirical studies in collaborative and social computing

1. Introduction

The dissemination of information on online social media is pertinent within variegated contexts, such as political, health and crisis messaging. Twitter, known as “X” as of July 2023, is a popular social media site that allows its users to emit short-form messages to their network of followers, which were formerly known as tweets. Within politics, Twitter is used by politicians to advocate for their political agendas and campaigns. For health news, diffusion of health policies and recommendations inform the public in a timely manner (Kullar et al., 2020). Warnings of environmental hazards and emergencies were often transmitted and retransmitted on Twitter (Sutton et al., 2015). Academics are active on Twitter to increase the visibility of their work and networks. Though some of these uses of Twitter can be characterized as healthy, productive, or innocuous, Twitter is also a common site of harmful behavior such as aggression, bullying, and misinformation.

This paper contextualizes how scholars’ academic works are used on what formerly was Twitter, where a user may link or cite a paper for a variety of reasons. Academics and researchers often promoted their articles on Twitter (Zakhlebin and Horvát, 2020). A greater dissemination of research articles had been shown to favorably impact scholars’ Twitter following and citation counts (Ortega, 2016; Luc et al., 2021). These are examples of the productive forces of science; new papers and new knowledge are disseminated to a wider range of people, and hence their authors receive recognition from it. This paper is focused on the more inconspicuous and perhaps more unwanted side of dissemination of science. In particular, we aim to identify how science can be used as fuel in discursive social media arenas. Furthermore, we aim to see how science is being utilized on Twitter as unproductive tools of harassment and argumentation. It has been studied that science plays a role in contested discursive arenas and in small and specific sites (Buchanan, 2013; Anderson and Huntington, 2017; Hmielowski et al., 2014), but broad overviews of how science is weaponized have not been conducted generally and broadly across different discursive arenas on social media. To find and analyze how science is utilized negatively in discursive arenas, we identified how science was used in conjunction with offensive language on Twitter.

Offensive language is defined as hurtful, derogatory, or obscene utterances, and includes insults, profanity, and general abuse (Schmidt and Wiegand, 2017; Zampieri et al., 2019). The study of offensive language is particularly salient to the discourse surrounding the academic community and science in general. One pertinent aspect of this is the skepticism and politicization that are attached to academia. Politicization of science and distrust of science are well documented (Gauchat, 2012). This politicization of science can elicit offensive language in discussions of science on Twitter. Discussions of science and academia on Twitter have underlying political leanings, which can encourage the use of abusive language (Anderson and Huntington, 2017; Mosleh et al., 2021). Topics such as global warming invite partisan media attacks and skepticism (Hmielowski et al., 2014). Certain academic disciplines such as political science and sociology are subject to partisan lines by nature. What are offensive tweets talking about, and how do they fall under broader discursive subjects? These results can have broader implications; Gauchat (Gauchat, 2012), for instance, suggests that higher visibility and science education leads to distrust, rather than trust, of science in conservative crowds. Furthermore, offensive language in academic contexts can detract from the credibility of researchers’ findings (Barnes et al., 2018; König and Jucks, 2019). By using a mixed-methods approach with topic modelling and critical discourse analysis (CDA), we identify the topics of tweets containing offensive language that reference academic articles, as well as the broader discourses that they fall under. Lastly, we want to distinguish whether academic articles are utilized to correct misconceptions about scientific subjects or used in more malicious ways, and ask if the practice of aggressively citing academic articles is employed by users from one or many ideological attitudes. In essence, we ask:

RQ1: How do offensive tweets citing academic articles fall within broader discursive and ideological boundaries?

To analyze the effects of offensive language in these tweets, we test the effects of offensive language in the dissemination of these tweets. The goal of scholarly work is ostensibly to produce new knowledge, but previous literature on tweets’ virality suggests that tweets with more more pronounced sentiment disseminated more widely (Zafra et al., 2021; Tsugawa and Ohsaki, 2017, 2015; Stieglitz and Dang-Xuan, 2013; Antypas et al., 2023; Pivecka et al., 2022). The case that offensive language positively impacts the spread of academic articles would prove unfavorable — possibly hurting reputations of both academics and academia, and shifting the motive of sharing articles away from sharing knowledge to less auspicious reasons. Thus, is the visibility of tweets citing academic literature in conjunction with negative language higher? Do offensive tweets that reference academic articles increase the virality of content, similar to the political, negatively valenced tweets? A study of their virality can provide meaningful insights to whether or not these tweets reach larger audiences, and whether they reach them quicker than non-inflammatory tweets citing academic articles. We return to the idea of comparing productive dissemination of science with unwanted ones; finding that offensively languaged tweets spread wider or faster than their neutral or postitive counterparts may reveal a dangerous paradigm of how science is cited and used on social media. In summary:

RQ2: Does offensive language in tweets that reference academic papers increase or decrease their virality?

This paper makes two main contributions. First, we use a mixed-methods approach that shows for the first time in the context of science dissemination that academic articles are being both weaponized and criticized with offensive language on Twitter. Secondly, we provide empirical grounds that show the nature of where these tweets are found, what they are discussing, and the nature of their spread.

2. Related Work

2.1. Tweet Virality

Initial work on tweet virality characterized virality as the number of retweets and concluded that tweets containing emotionally charged terms and emoticons affected retweet volume and likelihood, as well as the presence of hashtags, url, and usernames (Naveed et al., 2011; Suh et al., 2010). Hong and Davidson (Hong and Davison, 2010) utilized topics generated from topic modeling, a natural language processing technique that algorithmically generates topics from a corpus of documents, as significant factors affecting retweet likelihood. More concrete measures of virality have been developed and utilized since then. Steiglitz & Duan-Xuan (Stieglitz and Dang-Xuan, 2013) measured virality of twitter messages related to German state parliament elections through retweet volume and retweet speed. They define retweet speed as the difference in time between the original tweet and its first retweet (Stieglitz and Dang-Xuan, 2013). Tsugawa et al. (Tsugawa and Ohsaki, 2015) extended their measure of retweet speed to the time between an original tweet and its nth retweet.

These papers analyzed sentiment in terms of the valence-arousal space, which has the valence dimension (negative to positive) and the arousal (calm to negative) dimension (Posner et al., 2005). Twsugawa (Tsugawa and Ohsaki, 2015) and Steiglitz & Duan-Xuan (Stieglitz and Dang-Xuan, 2013) indicate that any level of nonneutral (e.g. positive and negative) sentiments increase likelihood for retweet volume and retweet speed. Pivecka et al. (Pivecka et al., 2022) and Zafra et al. (Zafra et al., 2021) borrow from previous techniques within specific political contexts; Pivecka et al. (Pivecka et al., 2022) showed that for tweets from Austrian politicians, tweets with high arousal had higher retweet volumes than low arousal, and that tweets with negative valence were associated with decreased retweet volume while tweets with positive valence increased retweet volume. Zafra et al. (Zafra et al., 2021) showed an increased retweet volume for tweets with more positive words and a decreased retweet volume for tweets containing more negative words in their dataset of tweets that referenced the Catalan independence referendum in 2017. These results do not show a consistent direction of bias of sentiment on tweet virality and are expected to be closely tied to the content of their datasets. The domain of the dataset is significant; for instance, Zafra et al. (Zafra et al., 2021) hypothesized that negative tweets would garner more retweets due to the Spanish unpopularity of the Catalan independence movement. These trends have not been analyzed in regards to academic scholarship, and how academic articles are spread.

2.2. Relationships Between Virality, Twitter, Science, and Offensive Language

Virality has substantial bearing on the influence and exposure for academics and their articles. For academic researchers, the dissemination of their work on Twitter can increase their follower count and increase their citation count by way of an increased follower count (Ortega, 2016), which becomes salient considering the increasing vitality of publication metrics to a researcher’s success over the last century (Fire and Guestrin, 2019). In a study of the effects of articles published by a social media network of cardiothoracic surgery news, authors of the referenced papers gained substantially increased citation counts over a one year period and larger twitter followings (Luc et al., 2021). For researchers, the effects of having a single article or tweet go viral substantially affects their short and long term visibility by gaining increased followers compared to researchers who have not experienced virality (Hasan et al., 2022).

Despite researchers’ recognition of how Twitter can be helpful in develo** academic networks, scientists who have left Twitter entirely have cited reasons of increased hostility and right-wing trolls for leaving (Valero, 2023). One reason may be that academic disciplines and their related articles can be controversial on social media sites due to the politicization of science. The politicization of science has increased over the past few decades; Gauchat (Gauchat, 2012) analyzes how conservatives’ distrust of science has increased from 1974 to 2010, and that increased science education and visibility predicts increasing distrust in conservative networks. One of the most visible topics that is subject to politicization is climate change; over the last several decades debates on climate change have become increasingly drawn along political lines (Chinn et al., 2020; Dunlap and McCright, 2010). This politicization of science can also inform how and when offensive language is used. Incivility, defined as an attack on a person’s character, was found to be used in association with right-leaning political topics in a discourse analysis of Twitter discussions of climate change (Anderson and Huntington, 2017). With the politicization and controversy of science, analysis of how offensive language emerges in arguments over “correcting” political news can also inform us of the nature of offensive language that occurs in tweets referencing academic articles. In Mosleh et al. (Mosleh et al., 2021), the analysis of tweet chains debating political news were found to increase in their toxicity as debates became heated.

Computational approaches to several facets of offensive language have been researched, such as the development of natural language processing models that perform automatic detection of offensive language on social media (Chen et al., 2012; Zampieri et al., 2019; Camacho-Collados et al., 2022) and closely related automatic detection of hate speech (Chen et al., 2012; Fortuna and Nunes, 2018; Schmidt and Wiegand, 2017; Zampieri et al., 2019) and cyberbullying (Chatzakou et al., 2017). Silva et al. (Silva et al., 2016) proposes a categorization of hate speech on Twitter and social media website Whisper through a lexical comparison with terminology from Hatebase and the FBI’s legal language of hate crimes, forming categories such as “race,” “ethnicity,” “sexual orientation,” and “gender.” Their lexical analysis of tweets reveal that behavior and race are the predominant targets of hate speech (Silva et al., 2016). Conversely, Evkoski et al. (Evkoski et al., 2021) seeks the source of hate speech in Slovenian tweets, which reveals that the majority of offensive tweets are emitted by one singular community, and is dominated by political and ideological slants.

2.3. Critical Discourse Analysis and Topic Modeling

The most pervasive view of Tweet virality is that virality emerges from a message’s textual content (Guerini et al., 2021, 2012). One way scholars have categorized tweets to understand their content is through topic modeling. Topic modeling is a natural language technique to cluster documents, such as tweets, and generate the most representative words for each cluster. The most common method of topic modeling is Latent Dirichlet Allocation (LDA) (Blei et al., 2003). LDA, though commonly used to generate topics on Twitter datasets, assumes that the documents are much longer than the 280 character maximum of a tweet. Various short text topic modeling (STTM) algorithms have been developed for datasets such as Tweets; Xiaobao et al. (Wu et al., 2020) proposes a neural network based model that optimizes for coherence score and topic uniqueness that outperforms LDA. They utilize a short-text encoding, a novel embedding technique, and negative sampling to achieve greater topic diversity and topic coherence (Wu et al., 2020). The implication of this is that tweets belong to fewer topics, and identifying citizenship of a document to a topic can help topics’ interpretations.

Critical discourse analysis (CDA) investigates dynamics of ideology and social problems through analyzing discourse (Johnson and McLean, 2020). CDA is relevant for our problem in two ways: firstly, it allows us to contextualize topics generated from topic modeling, which can be incoherent and difficult to explain, and identify their overarching discourses. Secondly, a document-focused discourse analysis colors the language of specific tweets to reveal relationships between tweet virality, offensive language, and the politicization of science. Jacobs (Jacobs and Tschötschel, 2019) notes that topic models can act as decompositions for broader “subjects” or “themes,” and the topics act as a “collection of patterns of language use representing those themes.” In its reverse, we can use CDA to group visibly incoherent topics into their broader, more explainable discourses, as suggested by Aranda et al. (Aranda et al., 2021). Performing document-focused CDA over a topic model appropriately fits the assumption that virality of a document emerges from the content of its text; understanding the context of a topic and its broader ideologies reveals qualities of the tweets that belong to it.

3. Data

We obtained a dataset from Altmetric LLC that tracks mentions of over 24 million research articles on a range of publicly available internet sources that include news outlets, blogs, Twitter, Facebook, Reddit, and Wikipedia (AltMetric, 2023). Our sample contains tweets mentioning the 9650 most mentioned articles contains the 9,650 most mentioned collected from from their api between the time span of January 1, 2016 to October 8, 2018. A mention explicitly indicates that a URL link has been made to the respective article in the body of the text of the media post in question, such as a tweet, Facebook post, or blog entry. The dataset does not capture posts that respond to those captured within the Altmetric data.. The resulting dataset contains 3,140,493 tweets, including retweets, made by 748,283 unique users. Each row of the dataset contains the original text of the tweet, the DOI being linked in the tweet, a unique post-id, an anonymised identifier for the Twitter user, and whether or not the tweet is a retweet.

We filter the data with the intent of both generating topics as well as evaluating sentiment and offensive language. The content of the linked articles, such as their title, were removed from the texts such that subsequent analyses focuses only on user-generated text. An initial survey showed that a significant number of tweets only included the title; these tweets were removed. The final dataset contains 2,906,058 tweets made by 748,104 users. Of these tweets, 760,799 are original tweets, and 2,145,259 are retweets. Since our dataset does not indicate the source tweet of a retweeted tweet, we created a retweet dataset by grou** all tweets by their text and DOI and selecting only unique pairs of text and DOI for each unique tweet. By this, any retweets with matching text and DOI must necessarily be a retweet of an original tweet of a text and DOI pair. Functionally, this also means eliminating any tweets that have repeat text and repeat DOI, further filtering our 760,799 unique tweets to 295,078 unique tweets that have 0 or more retweets.

We assigned each tweet an academic subject with the use of OpenAlex. OpenAlex is a catalog of academic entities and their relations; it archives “works” (such as journal articles, books, and theses), their authors, their subject matters, sources, institutions, and publishers within a relational graph structure. OpenAlex organizes their subjects with a “level” system; highest level subjects, such as Physics or Medicine, are at “level 0,” and deeper levels contain more specificity. They label works’ subjects with a classifier trained on Microsoft’s Academic Graph, a heterogeneous graph of academic articles that utilizes automatic concept tagging (Priem et al., 2022). We queried each DOI in our dataset for their level 0 concepts, and assigned each tweet with the associated concept of the DOI being referenced. There are a total of 19 level 0 concepts that we organize them by.

4. Methodology

4.1. Offensive Language and Sentiment Classification

To find which tweets are offensive, and to analyze sentiment and offensive language as factors affecting virality, we used offensive language and sentiment classification. We use the TweetEval model within the Python package TweetNLP to perform both offensive language classification and sentiment analysis. It classifies sentiment analysis by its valence (negative to positive). We output valence as a binary variable for positive, neutral, and negative. TweetEval is a RoBerta-based self-supervised language model that has been trained on SemEval 2019’s offensive language and SemEval 2017’s sentiment challenges as a baseline and was later pre-trained on a corpus of 60M tweets (Camacho-Collados et al., 2022). For these tasks, we filtered the dataset to the 760,799 original tweets in our dataset.

We compile the total number of tweets per discipline, offensive tweets, and the percentage of tweets that are offensive. We then test for correlations between discipline and rates of offensive language with a Kruskal-Wallis rank sum test.

4.2. Topic Modeling and Topic Interpretation

We use Xiaobao et al.’s (Wu et al., 2020) implementation of Negative sampling and Quantization Topic Model (NQTM) over the tweets that were classified as offensive in the data annotation step. To optimize the number of topics, we perform a grid search over number of topics K=[10,25,50,100]𝐾102550100K=[10,25,50,100]italic_K = [ 10 , 25 , 50 , 100 ] and select the topic model with the highest coherence score cvsubscript𝑐𝑣c_{v}italic_c start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT with a reasonable topic uniqueness.

We then perform CDA over the topics to generate broader discourses into which we can group the topics. Afterwards, we categorize each topic with respect to their linkages to different contexts, as suggested by Aranda et al. (Aranda et al., 2021). We perform CDA on these topics by qualitatively analyzing the terms associated with each topic and their most representative documents. We select the 5 most representative documents for each topic i[1,K]𝑖1𝐾i\in[1,K]italic_i ∈ [ 1 , italic_K ], where K is the number of topics selected in our topic model. This can be written as argmax(𝚯:,i)[5:]argmax(\mathbf{\Theta}_{:,i})[-5:]italic_a italic_r italic_g italic_m italic_a italic_x ( bold_Θ start_POSTSUBSCRIPT : , italic_i end_POSTSUBSCRIPT ) [ - 5 : ], where ΘΘ\Thetaroman_Θ is the document-topic distribution matrix of size D×K𝐷𝐾D\times Kitalic_D × italic_K, and argmax(𝚯:,i)𝑎𝑟𝑔𝑚𝑎𝑥subscript𝚯:𝑖argmax(\mathbf{\Theta}_{:,i})italic_a italic_r italic_g italic_m italic_a italic_x ( bold_Θ start_POSTSUBSCRIPT : , italic_i end_POSTSUBSCRIPT ) ranks the most representative documents (e.g., a document with the score of “1” would be distributed solely to topic i𝑖iitalic_i, and thus be most representative). To contextualize topics, we also compile the most frequent topics for each discipline Disc𝐷𝑖𝑠𝑐Discitalic_D italic_i italic_s italic_c. For this step, we assign each offensive tweet iDisc𝑖𝐷𝑖𝑠𝑐i\in Discitalic_i ∈ italic_D italic_i italic_s italic_c the topic with the highest value in their topic distribution, argmaxΘi,:𝑎𝑟𝑔𝑚𝑎𝑥subscriptΘ𝑖:argmax\Theta_{i,:}italic_a italic_r italic_g italic_m italic_a italic_x roman_Θ start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT, where Disc𝐷𝑖𝑠𝑐Discitalic_D italic_i italic_s italic_c is the academic discipline their referenced article belongs to. We use these contexts to inform our CDA to generate topic categories. By analyzing topics’ most representative texts, and integrating them within broader themes of offensive language, politicization, and their relationships to academic disciplines, we can formulate broader topic categories that can contextualize our more granular topics generated by topic modeling.

4.3. Tweet Virality

We use regression analysis to determine the virality of all the tweets in our dataset. We test two measures of virality: retweet frequency and the time between the original and the 25th retweet. For our first regression analyses, we consider the total volume of retweets for every tweet in our dataset numRTs𝑛𝑢𝑚𝑅𝑇𝑠numRTsitalic_n italic_u italic_m italic_R italic_T italic_s as the dependent variable. Our second set of regression analyses consideres the time between the original tweet and the 25th tweet timeRT25𝑡𝑖𝑚𝑒𝑅𝑇25timeRT25italic_t italic_i italic_m italic_e italic_R italic_T 25. For both models, we consider the following variables:

  • Negative - Binary variable representing a negative valence assignment from our sentiment classification

  • Positive - Binary variable representing a positive valence assignment

  • Followers - Number of Twitter followers the original user had at the time of the tweet

  • Offensive - Binary variable representing offensive classification

  • Hash - Binary variable representing presence of one or more hashtags

We consider all the above variables following prior literature that has shown that valence (negative, positive), followers, and hashtags are significant factors contributing to the virality of a tweet (Zafra et al., 2021; Tsugawa and Ohsaki, 2017, 2015; Stieglitz and Dang-Xuan, 2013; Antypas et al., 2023; Pivecka et al., 2022), in addition to the offensive variable.

We use a negative binomial generalized linear model for the analyses for these regressions on our retweet volume numRTs𝑛𝑢𝑚𝑅𝑇𝑠numRTsitalic_n italic_u italic_m italic_R italic_T italic_s as the mean and standard deviation for numRTs𝑛𝑢𝑚𝑅𝑇𝑠numRTsitalic_n italic_u italic_m italic_R italic_T italic_s suggests over-dispersion (μ=3.81,σ=21.83formulae-sequence𝜇3.81𝜎21.83\mu=3.81,\sigma=21.83italic_μ = 3.81 , italic_σ = 21.83), as well as for timeRT25𝑡𝑖𝑚𝑒𝑅𝑇25timeRT25italic_t italic_i italic_m italic_e italic_R italic_T 25 (μ=7327,σ=51091formulae-sequence𝜇7327𝜎51091\mu=7327,\sigma=51091italic_μ = 7327 , italic_σ = 51091). Their regression models are the following:

(1) log(numRT)=β0+β1negative+β2positive+β3followers+β4offensive+β5hash𝑙𝑜𝑔𝑛𝑢𝑚𝑅𝑇subscript𝛽0subscript𝛽1𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒subscript𝛽2𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒subscript𝛽3𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠subscript𝛽4𝑜𝑓𝑓𝑒𝑛𝑠𝑖𝑣𝑒subscript𝛽5𝑎𝑠log(numRT)=\beta_{0}+\beta_{1}negative+\beta_{2}positive+\beta_{3}followers+% \beta_{4}offensive+\beta_{5}hashitalic_l italic_o italic_g ( italic_n italic_u italic_m italic_R italic_T ) = italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w italic_e italic_r italic_s + italic_β start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_o italic_f italic_f italic_e italic_n italic_s italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_h italic_a italic_s italic_h
(2) numRT=eβ0×eβ1negative×ebeta2positive×eβ3followers×eβ4offensive×eβ5hash𝑛𝑢𝑚𝑅𝑇superscript𝑒subscript𝛽0superscript𝑒subscript𝛽1𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒superscript𝑒𝑏𝑒𝑡subscript𝑎2𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒superscript𝑒subscript𝛽3𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠superscript𝑒subscript𝛽4𝑜𝑓𝑓𝑒𝑛𝑠𝑖𝑣𝑒superscript𝑒subscript𝛽5𝑎𝑠numRT=e^{\beta_{0}}\times e^{\beta_{1}negative}\times e^{beta_{2}positive}% \times e^{\beta_{3}followers}\times e^{\beta_{4}offensive}\times e^{\beta_{5}hash}italic_n italic_u italic_m italic_R italic_T = italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_b italic_e italic_t italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w italic_e italic_r italic_s end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_o italic_f italic_f italic_e italic_n italic_s italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_h italic_a italic_s italic_h end_POSTSUPERSCRIPT
(3) log(timeRT25)=β0+β1negative+β2positive+β3followers+β4offensive+β5hash𝑙𝑜𝑔𝑡𝑖𝑚𝑒𝑅𝑇25subscript𝛽0subscript𝛽1𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒subscript𝛽2𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒subscript𝛽3𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠subscript𝛽4𝑜𝑓𝑓𝑒𝑛𝑠𝑖𝑣𝑒subscript𝛽5𝑎𝑠log(timeRT25)=\beta_{0}+\beta_{1}negative+\beta_{2}positive+\beta_{3}followers% +\beta_{4}offensive+\beta_{5}hashitalic_l italic_o italic_g ( italic_t italic_i italic_m italic_e italic_R italic_T 25 ) = italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w italic_e italic_r italic_s + italic_β start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_o italic_f italic_f italic_e italic_n italic_s italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_h italic_a italic_s italic_h
(4) timeRT25=eβ0×eβ1negative×ebeta2positive×eβ3followers×eβ4offensive×eβ5hash𝑡𝑖𝑚𝑒𝑅𝑇25superscript𝑒subscript𝛽0superscript𝑒subscript𝛽1𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒superscript𝑒𝑏𝑒𝑡subscript𝑎2𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒superscript𝑒subscript𝛽3𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠superscript𝑒subscript𝛽4𝑜𝑓𝑓𝑒𝑛𝑠𝑖𝑣𝑒superscript𝑒subscript𝛽5𝑎𝑠timeRT25=e^{\beta_{0}}\times e^{\beta_{1}negative}\times e^{beta_{2}positive}% \times e^{\beta_{3}followers}\times e^{\beta_{4}offensive}\times e^{\beta_{5}hash}italic_t italic_i italic_m italic_e italic_R italic_T 25 = italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_b italic_e italic_t italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w italic_e italic_r italic_s end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_o italic_f italic_f italic_e italic_n italic_s italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_h italic_a italic_s italic_h end_POSTSUPERSCRIPT

We perform further analysis by decomposing the offensive factor for each tweet that is classified as offensive. For this, we assign each offensive tweet i𝑖iitalic_i the topic with the highest value in their topic distribution ΘΘ\Thetaroman_Θ, argmaxΘi,:𝑎𝑟𝑔𝑚𝑎𝑥subscriptΘ𝑖:argmax\Theta_{i,:}italic_a italic_r italic_g italic_m italic_a italic_x roman_Θ start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT. Our second decomposition assigns each offensive tweet with a variable representing topic categories generated from our CDA. We evaluate each model’s fit with their R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT values. This can be modeled as the following, where categoryisubscriptcategory𝑖\text{category}_{i}category start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a placeholder for the topic categories we will discover.

(5) log(numRT)=β0+β1negative+β2positive+β3followers+β4hash+β5category1++βjcategoryj+4𝑙𝑜𝑔𝑛𝑢𝑚𝑅𝑇subscript𝛽0subscript𝛽1𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒subscript𝛽2𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒subscript𝛽3𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠subscript𝛽4𝑎𝑠subscript𝛽5𝑐𝑎𝑡𝑒𝑔𝑜𝑟subscript𝑦1subscript𝛽𝑗𝑐𝑎𝑡𝑒𝑔𝑜𝑟subscript𝑦𝑗4log(numRT)=\beta_{0}+\beta_{1}negative+\beta_{2}positive+\beta_{3}followers+% \beta_{4}hash+\beta_{5}category_{1}+\dots+\beta_{j}category_{j+4}italic_l italic_o italic_g ( italic_n italic_u italic_m italic_R italic_T ) = italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w italic_e italic_r italic_s + italic_β start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_h italic_a italic_s italic_h + italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_c italic_a italic_t italic_e italic_g italic_o italic_r italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_c italic_a italic_t italic_e italic_g italic_o italic_r italic_y start_POSTSUBSCRIPT italic_j + 4 end_POSTSUBSCRIPT
(6) numRT=eβ0×eβ1negative×ebeta2positive×eβ3followers×eβ4hash×eβ5category1××eβjcategoryj+4𝑛𝑢𝑚𝑅𝑇superscript𝑒subscript𝛽0superscript𝑒subscript𝛽1𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒superscript𝑒𝑏𝑒𝑡subscript𝑎2𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒superscript𝑒subscript𝛽3𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠superscript𝑒subscript𝛽4𝑎𝑠superscript𝑒subscript𝛽5𝑐𝑎𝑡𝑒𝑔𝑜𝑟subscript𝑦1𝑒subscript𝛽𝑗𝑐𝑎𝑡𝑒𝑔𝑜𝑟subscript𝑦𝑗4numRT=e^{\beta_{0}}\times e^{\beta_{1}negative}\times e^{beta_{2}positive}% \times e^{\beta_{3}followers}\times e^{\beta_{4}hash}\times e^{\beta_{5}% category_{1}}\times\dots\times e{\beta_{j}category_{j+4}}italic_n italic_u italic_m italic_R italic_T = italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_b italic_e italic_t italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w italic_e italic_r italic_s end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_h italic_a italic_s italic_h end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_c italic_a italic_t italic_e italic_g italic_o italic_r italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × ⋯ × italic_e italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_c italic_a italic_t italic_e italic_g italic_o italic_r italic_y start_POSTSUBSCRIPT italic_j + 4 end_POSTSUBSCRIPT
(7) log(timeRT25)=β0+β1negative+β2positive+β3followers+β4hash+β5category1++βjcategoryj+4𝑙𝑜𝑔𝑡𝑖𝑚𝑒𝑅𝑇25subscript𝛽0subscript𝛽1𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒subscript𝛽2𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒subscript𝛽3𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠subscript𝛽4𝑎𝑠subscript𝛽5𝑐𝑎𝑡𝑒𝑔𝑜𝑟subscript𝑦1subscript𝛽𝑗𝑐𝑎𝑡𝑒𝑔𝑜𝑟subscript𝑦𝑗4log(timeRT25)=\beta_{0}+\beta_{1}negative+\beta_{2}positive+\beta_{3}followers% +\beta_{4}hash+\beta_{5}category_{1}+\dots+\beta_{j}category_{j+4}italic_l italic_o italic_g ( italic_t italic_i italic_m italic_e italic_R italic_T 25 ) = italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e + italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w italic_e italic_r italic_s + italic_β start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_h italic_a italic_s italic_h + italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_c italic_a italic_t italic_e italic_g italic_o italic_r italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_c italic_a italic_t italic_e italic_g italic_o italic_r italic_y start_POSTSUBSCRIPT italic_j + 4 end_POSTSUBSCRIPT
(8) timeRT25=eβ0×eβ1negative×ebeta2positive×eβ3followers×eβ4hash×eβ5category1××eβjcategoryj+4𝑡𝑖𝑚𝑒𝑅𝑇25superscript𝑒subscript𝛽0superscript𝑒subscript𝛽1𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒superscript𝑒𝑏𝑒𝑡subscript𝑎2𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒superscript𝑒subscript𝛽3𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠superscript𝑒subscript𝛽4𝑎𝑠superscript𝑒subscript𝛽5𝑐𝑎𝑡𝑒𝑔𝑜𝑟subscript𝑦1𝑒subscript𝛽𝑗𝑐𝑎𝑡𝑒𝑔𝑜𝑟subscript𝑦𝑗4timeRT25=e^{\beta_{0}}\times e^{\beta_{1}negative}\times e^{beta_{2}positive}% \times e^{\beta_{3}followers}\times e^{\beta_{4}hash}\times e^{\beta_{5}% category_{1}}\times\dots\times e{\beta_{j}category_{j+4}}italic_t italic_i italic_m italic_e italic_R italic_T 25 = italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_b italic_e italic_t italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w italic_e italic_r italic_s end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_h italic_a italic_s italic_h end_POSTSUPERSCRIPT × italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_c italic_a italic_t italic_e italic_g italic_o italic_r italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × ⋯ × italic_e italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_c italic_a italic_t italic_e italic_g italic_o italic_r italic_y start_POSTSUBSCRIPT italic_j + 4 end_POSTSUBSCRIPT

5. Results

5.1. The Social Sciences are associated with more Offensive Tweets

We found 12,380 out of 760,799 original tweets to be offensive (1.63%, see Table 1). Note that a tweet can be both offensive and negative, as they are results from two different classifiers (one for valence, another for offensive language).

We examine the relationship between academic disciplines and the rates of offensive classifications exhibited in each discipline. As shown in Table 2, there is a clear distinction between the rates of offensive language in different disciplines. We test their association with a Kruskal-Wallis rank sum test, as the underlying distributions are dissimilar. The results determine that academic discipline is strongly associated with offensive language (X2=5383,p<2.2e16formulae-sequencesuperscript𝑋25383𝑝2.2𝑒16X^{2}=5383,p<2.2e-16italic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 5383 , italic_p < 2.2 italic_e - 16).

Tweets made about Philosophy papers have the highest rates of offensive language, with 5.2% of all unique tweets in Philosophy being classified as offensive. Furthermore, we find that the Humanities and the Social Sciences have the highest rates of offensive classifications; out of the eight disciplines with the highest incidents of tweets with offensive language, six (Philosophy, Political Science, Psychology, Sociology, Geography, Economics) are either Humanities or Social Sciences. The two disciplines with the least incidents of offensive language are Math (.9%percent\%%) and Engineering (.9%percent\%%). We can relate this back to the politicization of science, and the implied connection between politicization and offensive language. Humanities and Social Sciences are political by nature, and hence there is an intuitive justification for their strong representation in Table 2. Furthermore, the two sciences that are represented in the top 2 (Geology, 3.3%percent\%%, Environmental Science, 2.3%percent\%%) are concerned with subjects of controversy such as global warming and fossil fuels.

Table 1. Classification results
Total Tweets %
Negative 153,860 20.22
Neutral 490,783 64.51
Positive 116,156 15.27
Offensive 12,380 1.63
Overall 760,799
Table 2. Rates of offensive language by discipline, sorted.
Total Tweets Offensive %
Philosophy 10,714 555 5.2
Geology 32,399 1,081 3.3
Political Science 91,742 2,271 2.5
Psychology 114,975 3,671 2.5
Environmental Science 53,897 1,236 2.3
Geography 62,616 1441 2.3
Sociology 44,510 946 2.1
Economics 59,725 1,137 1.9
Materials Science 9,646 142 1.5
Biology 228,964 3540 1.5
History 10,557 152 1.4
Chemistry 31,031 423 1.4
Business 45,610 602 1.3
Computer Science 121,567 1534 1.3
Art 4,604 58 1.3
Medicine 415,233 5,354 1.3
Physics 36,113 390 1.1
Math 27,378 259 0.9
Engineering 28,148 266 0.9
Table 3. Sample of topics for each topic category. N denotes the number of topics in the topic category.
Science(N=24) gene eye decreases gmo glyphosate regions expression midbrain
fore sucks throwing regulate arse kale glucose
denialist address evidence cited willfully sources scientific
paranoid lack free blogs debunks troll disingenuous needed
Political(N=11) nuclear noncombatants hiroshima revisiting weapons iran makes
dumb smartphone dope phone dumber smart politics bonkers
sold sales unethical tactics bullshit clinton salesman manipulator
scale notmypresident logic statements blow lacks term
Race(N=2) biased holy crap narrative false horribly racial
presents adult translation homicides article mit cking ethnic
black white racist racism pain thicker anxiety
skin ppl doctors cops voted worry heroin opioids
Gender and Sexuality(N=7) men volume active women artery sexually coronary monogamous plaque
testosterone relationships treatment greater patriarchy bitch
women gay heterosexual lesbian experiences orgasm ages
sample pleasure touching frequency genital bisexual sexual differences
Religion(N=3) kitchen sponges vertebrate kinds billions sterilizing humans
islam united monster racialization bacteria islamophobia flying brown
brain damage religious sleep injury chicken fundamentalism eats
suffering head altered function dysbiosis pox woodpeckers
Other(N=3) fucked dick fascinating model title lot enjoy
poop yep worth travel neanderthal explain jet deranged
holy shit cow sherlock crock bull wild umm coolest
smh moment late potentially mirror publishes
Table 4. Most frequent topic and topic category per discipline. ’n’ denotes the occurrences of the most frequent topic and topic categories followed by percentage of the most common topic and topic category for each discipline.

Topic

Category

Biology

guns killing tory regime austerity disabled british electorate tories jews sick government nazis chemo vulnerable (n=212, 10%)

Science (n=998, 48%)

Business

guns killing tory regime austerity disabled british electorate tories jews sick government nazis chemo vulnerable (n=28, 10%)

Science (n=114, 41%)

Chemistry

ignorance useless willful supplement predict bitches neurotoxicity adjuvant alum save bcaa rubbish tests aluminum deceit (n=15, 10%)

Science (n=93, 60%)

Computer Science

holy shit cow sherlock crock bull wild umm coolest smh moment late potentially mirror publishes (n=28, 08%)

Science (n=176, 53%)

Economics

nuclear noncombatants hiroshima revisiting weapons iran makes dumb smartphone dope phone dumber smart politics bonkers (n=10, 8%)

Science (n=54, 43%)

Engineering

associated profound views misperceiving cruz rubio bullshit trump conservatism favorable plosone plos drumpf assoc comments (n=2, 15%)

Science (n=8, 62%)

Environmental Science

denialist address evidence cited willfully sources scientific paranoid lack free blogs debunks troll disingenuous needed (n=98, 10%)

Science (n=609, 62%)

Geography

giant dinosaurs australia pee hole mass frog massive million banned frogs sun birds pandemic close (n=22, 6%)

Science (n=211, 57%)

Geology

satellite hockey stick mann dishonest data denialist anomaly moth question michael multiple pretends blonde questions (n=25, 8%)

Science (n=191, 57%)

History

denialist address evidence cited willfully sources scientific paranoid lack free blogs debunks troll disingenuous needed (n=3, 12%)

Science (n=12, 48%)

Materials Science

fuck fucking cool leopard chimpanzees kidding yeah crispr metal cas urine bitch cephalopods chimps milk (n=10, 19%)

Science (n=40, 74%)

Mathematics

reviewed peer published illness paper journal suspects opinion garbage alcohol mental dysbiosis nobel hippopotamus domestic (n=2, 18%)

Science (n=6, 55%)

Medicine

guns killing tory regime austerity disabled british electorate tories jews sick government nazis chemo vulnerable (n=320, 83%)

Science (n =1544, 50%)

Philosophy

guns killing tory regime austerity disabled british electorate tories jews sick government nazis chemo vulnerable (n=148, 42%)

Political (n=238, 68%)

Physics

fuck fucking cool leopard chimpanzees kidding yeah crispr metal cas urine bitch cephalopods chimps milk (n=13, 12%)

Science (n=68, 63%)

Political Science

russia you re hypocrite elections smells america uranus baloney sponsored election farts elected infanticide cia (n=54, 8%)

Political (n=292, 45%)

Psychology

gender fractions feminist fungus lipid glaciology ants glaciers looks sounds surgeon like sound takes badass (n=92, 4%)

Science (n=995, 42%)

Sociology

gender fractions feminist fungus lipid glaciology ants glaciers looks sounds surgeon like sound takes badass (n=17, 8%)

Science (n=98, 44%)

5.2. Topic Categories describe a wide variety of Controversial Topics

We find that K𝐾Kitalic_K=50 topics had the highest average coherence score among its topics (cv=.285subscript𝑐𝑣.285c_{v}=.285italic_c start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = .285). Furthermore, the model was able to maintain a sufficiently high level of topic uniqueness, with a topic uniqueness of 1 implying complete uniqueness between topics. Maintaining a high level of topic uniqueness allows us to obtain more discrete decompositions of offensive tweets into topics for the regression analysis, and further allows us to obtain more clear distinctions among topics.

5.2.1. Summary Statistics

Table 3 shows two topics from each topic category, and the number of topics N𝑁Nitalic_N contained in the topic category. Table 4 shows the most frequent topic and topic category for each academic discipline. We also display the number of tweets n𝑛nitalic_n and percentage of tweets within that discipline associated with their most frequent topic and topic category.

5.2.2. Science

We determined that 24 of our topics were pertained strongly to discussions of science. Extracting the most representative tweets for the first topic in Table 3 under the category of science reveals pointed discussions of GMOs, herbicides, mental health, diabetes, gene modification, and global warming. These are visibly related to the words “gene”, “gmo”, “midbrain”, “glucose”, and “regulate” found within the topic. The second topic, which is also the most represented topic for offensive tweets that reference an environmental science DOI, is clearly related to denialism of global warming upon digging into its most representative tweets. The most representative offensive tweet of its topic cites an academic article on recent mass extinction in oceans in an argument with another user: “Yes, you are a mindless troll. Not my fault you still lack the intellect + integrity to address scientific research.” This supports previous literature suggesting that toxic language arises in political arguments on social media (Mosleh et al., 2021; Anderson and Huntington, 2017).

Looking at other scientific disciplines and their most frequent topics also show clear relationships between mentions of controversial topics and how offensive language occurs within these scientific topics. For example, tweets related to a topic that mentions “crispr” both express wary and excitement for it by usage of profane language. Analysis of Geology’s most frequent topic gives further proof of how offensive language is related to politicization of science; the abstruse term “hockey stick” in the topic is shown to refer to the ”hockey stick graph” of temperature anomaly (Marsicek et al., 2018). CDA also lets us analyze the broader conversations that these tweets occur in, and the tweets that do not reference the original DOI that are found in the responses. In response to one offensive tweet that mentions the “hockey stick,” a user writes “Seriously, what are you talking about. I can’t believe you call yourself a climate scientist… The evidence for these warm periods is everywhere, none of this is even disputed. Sigh.” The nature of offensive language in scientific topics seem to occur frequently in debates on both sides of a polarizing subject, and as well as highlighting criticism or support for new scientific discoveries.

An important takeaway from this category is that we can readily reject an intuition that citing academic articles with aggressive language is primarily used for “debunking” false information. What we find instead is a mixture of both; aggressive language positively highlighted climate change articles but were also used in tweets such as the “hocky stick” tweet, where an article or an excerpt from an article is “cherry-picked” to maintain science-denialism.

5.2.3. Race

The two topics pertaining to race in our topic model are represented by discussions of racialized violence. The first topic implies connections between race, ethnicity, and violence (“homicide”); the second is an amalgamation of racialized violence through police and the opioid epidemic. An analysis of the most representative tweets reveals angered discussions on police brutality (“cops are racist shits”), domestic violence, and Trump’s racist policies and language. While many of the tweets reviewed act in the spirit of anti-racist activism, there are several tweets that conflate race with intelligence and crime.

The emergence of actively racist tweets in this sample best indicate how academic articles are reshaped and reimagined to fit existing ideologies from multiple viewpoints. The dialectic between anti-racism and racist tweets highlights that the inflammatory usage of academic articles is not limited to simply one perspective, and that the the weaponization of academic articles is amorphous; similar to the category of science, we find that multiple perspectives exist, and that the articles are both used to push back and to maintain false and dangerous ideologies.

5.2.4. Religion

Topics related to religion reference islamophobia, terrorism, and religious fundamentalism. Seemingly random discursive words are embedded into these topics, such as “kitchen sponges” in the first, and “brain damage” in the second. In the most representative sample of tweets belonging to this category of topics, we find that these tweets only contain tweets that maintain the singular attitude of being islamophobic, often conflating Islam with terrorism. The question of how “brain” is incorporated into the second topic is revealed through looking at the most representative tweets; several tweets perform a selective reading on an article to conclude that religious fundamentalism is related to brain damage. This category is thus notable for its homogenous perspectives, which depart from other categories that reveal different attitudes and ideologies.

5.2.5. Politics

Political topics made up the second largest category of topics. We gave thirteen topics a primary category of politics. Any topic that contained words that signified a branch or segment of the government, or the name of a specific politician, was considered for this topic category. Several of the tweets that are most representative of the first topic contain tweets that refer to nuclear weapons, marijuana legalization, nuclear energy, and nuclear energy. These tweets are heavily polarized towards the ends of the political axis; a tweet that mentions identity politics uses “dumb” to describe identity politics as a “manifestation of the left.” Tweets that mention nuclear energy link DOIs related to the efficiency of nuclear energy to push back on the conservatives’ propensity towards fossil fuels. In one tweet, its user uses profane language to argue that conservative voters’ tendency to believe misinformation.

Offensive tweets within this category take on both conservative and liberal partisan slants. We categorized the topic that has the most tweets that belong to it within the disciplines of biology, business, medicine, and philosophy as political; this topic, which contains terms such as “guns,” “killing,” “tory,” “electorate,” “government,” reveals several different political issues after examining the tweets that represent it. Tweets discuss war, abortion, wealth inequality, healthcare, and gun violence. The nature of several of these tweets is anticipatory of political backlash from conservative spheres; in one tweet, a user writes that an article’s findings will be contested as being written by “snowflakes” or “nazis.” We see the language that is commonly viewed as being used in conservative spheres being reappropriated to anticipate hostility; another tweet that is assigned this topic suggested that wealth inequality is not just a “Leftist crank issue.” Our qualitative analysis suggests that several of these tweets use offensive words in conjunction with conservative “dog whistles,” which are words that are contextually understood within a specific political context, such as “snowflake” or “libtard.” However, the political undercurrent of the tweets in our corpus suggest a left-leaning political stance, and use these terms not to prove citizenship in conservative networks on Twitter, but rather in anticipation of political backlash. This can be seen as a measure that shields against ad-hominem claims against academics, which has been shown to lessen the validity of claims made by scientists (Barnes et al., 2018). Of the 733 tweets that are assigned this topic category, only a handful expressed concrete conservative sentiments, while most expressed left-leaning sentiments.

5.2.6. Gender and Sexuality

We generated the topic category of Gender and Sexuality from seven different topics. The nature of the topics and tweets that have their strongest alignment with this topic category are heavily critical of misogyny, sexual violence against women, constrictive abortion laws, sexuality-based prejudices, and sexual freedom. Similar to the tweets that are associated with the political topic category, these tweets used aggressive and offensive language defending feminist practices and studies. Conversely, multiple tweets criticize findings on womens’ sexual practices along a conservative slant.

5.2.7. Other

We generated the topic category of “other” from extraneous topics and their associated tweets that did not have an easily visible theme. Tweets that had the highest likelihood of being assigned a topic within these three topics had no clear grou**. However, qualitative analysis of this category’s topics and associated tweets reveal significantly high levels of profanity. Their usage of profanity differs; for example, tweets associated with the first topic use directed offensive language, to borrow from Waseem et al. (Waseem et al., 2017)’s typology of abusive language. The second topic, in particular, contains multiple tweets that express “holy shit” to various academic articles, which are a generalized form of profanity.

5.3. Offensive Tweets are Retweeted Less, but Faster

5.3.1. Retweet Volume

We perform a negative binomial regression analysis to answer whether or not offensive language in tweets that reference academic articles increase or decrease their virality. Results indicate that negative sentiment (β𝛽\betaitalic_β=.045, p < 2e-16), number of followers (β𝛽\betaitalic_β=1.05, p < 2e-16), presence of a hashtag (β𝛽\betaitalic_β=.14, p < 2e-16), and offensive language (β𝛽\betaitalic_β=-.25, p<2e-16) are significant factors for determining the total volume of retweets for a tweet in our dataset. Our analysis shows that log(followers) and presence of a hashtag are the strongest factors for increasing retweet count. Offensive language, on the other hand, is expected to decrease the retweet count. The strength of a regression coefficient β𝛽\betaitalic_β can be evaluated with eβsuperscript𝑒𝛽e^{\beta}italic_e start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT. An offensive tweet is expected to reduce the number of retweets by 1e.25381superscript𝑒.25381-e^{-.2538}1 - italic_e start_POSTSUPERSCRIPT - .2538 end_POSTSUPERSCRIPT, or 22.4%, compared to non-offensive tweets. Negative sentiment, while significant, only affects retweet volume very marginally (eβ=1.046superscript𝑒𝛽1.046e^{\beta}=1.046italic_e start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT = 1.046), and positive sentiment is insignificant.

Table 5. Regression results for numRT𝑛𝑢𝑚𝑅𝑇numRTitalic_n italic_u italic_m italic_R italic_T
*** significant at .1 percent; ** significant at 1 percent; * significant at 5 percent
numRT𝑛𝑢𝑚𝑅𝑇numRTitalic_n italic_u italic_m italic_R italic_T
β𝛽\betaitalic_β std. error Pr(<||||z||||)
hash*** 0.144315 0.004751 <2e-16
positive 0.010191 0.005651 0.0713
negative*** 0.043042 0.006454 <2.58e-11
log(followers)*** 1.047963 0.008192 <2e-16
Political* -0.095680 0.044638 0.0321
Race -0.148262 0.100263 0.1392
Religious 0.157492 0.085436 0.0653
Other*** -0.361750 0.074653 <7.54e-10
Gender and Sexuality*** -0.329844 0.068455 <1.45e-6
Science*** -0.384225 0.045510 <2e-16
R2superscript𝑅2{R}^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .567
N. Observations 295,078

Table 5 shows the results for our regression analysis when we decompose the offensive language factor into topic categories, as a tweet belonging to a topic category necessarily indicates that it is classified as offensive. We find nearly equivalent regression coefficients for the unchanged factors. We identify that offensive tweets in the topic categories of political (β𝛽\betaitalic_β=-.09, p = .032), other (β𝛽\betaitalic_β=-.36, p < .754e-10), gender and sexuality (β𝛽\betaitalic_β=-.33, p < 1.45e-6), and science (β𝛽\betaitalic_β=-.38, p < 2e-16) are statistically significant, and decrease the number of retweets a tweet receives. Offensive tweets associated with the topic categories of race were not deemed significantly significant.

5.3.2. Retweet Speed

To test our independent factors against regression speed, we filter our dataset for tweets with at least 25 retweets (N=6949). We should note that the number of offensive tweets in this sample is very small (n=54).

Regression analysis for the difference in minutes between the original tweet and 25th retweet that hashtags (β𝛽\betaitalic_β=.35, p <8.52e-10), negative sentiment (β𝛽\betaitalic_β=.18, p=0.0038), and positive sentiment (β𝛽\betaitalic_β=.15, p=.02) increases the time between the original and 25th retweet, and hence decreases retweet speed. We find that offensive language (β𝛽\betaitalic_β=-2.06, p <3.46e-10) and log(followers) (β𝛽\betaitalic_β=-0.886, p <2e-16) substantially increase retweet speed, by nearly 87% and 59%, respectively.

6. Discussion

Our analysis of offensive language in tweets that reference academic articles confirms intuitions about entanglement of offensive language and the politicization of science. Our CDA and topic modeling confirm several outstanding findings from previous literature. Firstly, they bolster previous research that finds offensive language within politicized contexts on Twitter (Anderson and Huntington, 2017; Mosleh et al., 2021; Kong et al., 2022). Secondly, they reveal incredibly varied subjects of political contention such as but not limited to global warming, racialized violence, abortion, and sexuality. We further emphasize that topic categorizations are not completely discrete; by the nature of topic modelling, certain topics can contain different themes that can be split between the topic categories we interpreted. However, this inability for our generated topics to be categorically distinct within our topic categories reflects the shared political undercurrent carried in each of our topic categories. Outside of our topic category of “Other,” our topic categories of Race, Science, Politics, Gender and Sexuality cannot be disentangled from politics.

Specifically, our results indicate that offensive language is weaponized at a significant rate across many disciplines, and for a wide variety of uses. This reveals that science dissemination, at the fringes, may have dangerous characteristics that have not been identified before. Firstly, we see that this type of dissemination of science carries strains of racism, science denialism, homophobia, gender discrimination, and islamophobia that reinterprets and selectively cites academic articles. The fact that they are used for argumentation should be a concern; these offensive tweets reach a level of virality quicker than non-offensive tweets (though they reach a narrower audience). This calls for more investigations on how science is being cited in discursive arenas on social media as views and credibility of science are changing.

Though diffusion of academic articles is vital for researchers’ success, academics should not be attacked with offensive language or have their works weaponized in incivil attacks, as our CDA has shown. Academia is not a context where the axiom “any publicity is good publicity” applies, and especially when its publicity is in conjunction with abusive and offensive language; research has found that adhominem attacks and the use of abusive language in scentific discussions leads to lower perceived credibility (Barnes et al., 2018; König and Jucks, 2019). Favorably, our results indicate that offensive tweets referencing an academic DOI reach a smaller audience than non-offensive tweets. However, offensive tweets in our dataset that do reach a level of virality do so much quicker than non-offensive tweets.

7. Conclusion

This work conducts mixed-method analyses of offensive tweets that references academic articles, and the relationship between offensive tweets, politicization of science, and virality. We use sentiment and offensive language classification to annotate our dataset of tweets, and then perform critical discourse analysis over our generated topic models to contextualize our topics with the broader theme of the politicization of science. Our critical discourse analysis reveals that offensive language is heavily utilized in politicized messages regarding topics such as race, science, gender and sexuality, religion. Lastly, results from our regression analysis show that offensive tweets that reference academic DOIs are diffused at lower volumes, but have rapid diffusion when they do go viral.

7.1. Limitations

We should note that our dataset contains only tweets from the 9,650 most mentioned articles on Altmetric, with the most recent tweet in the dataset stemming from nearly 4 years ago. This presents several limitations. Firstly, selecting tweets mentioning only popular articles may imply higher quality articles from more reputable authors and publications, which may imply lower inherent frequencies of offensive language in the tweets that reference them. Secondly, the current political atmosphere of “X”, formerly known as Twitter, may not be well represented in our study. Several large events have occurred that may drastically change the behaviors of offensive language on Twitter and are not captured in our dataset, such as the COVID-19 pandemic, resurgence of BlackLivesMatter during George Floyd protests, and the 2020 American election. Though our dataset does not contain these events, they have clear relationships with our work’s findings. Lastly, Twitter has changed hands, as Elon Musk took over Twitter in October 2022, and has rebranded it to “X” as well as changed several of the platform’s rules and guidelines.

Acknowledgements.
We would like to acknowledge for their continued support and many reviews of this paper from its infancy to where it is now.

References

  • (1)
  • AltMetric (2023) AltMetric. 2023. Altmetric Support. https://help.altmetric.com/support/home
  • Anderson and Huntington (2017) Ashley A Anderson and Heidi E Huntington. 2017. Social media, science, and attack discourse: How Twitter discussions of climate change use sarcasm and incivility. Science Communication 39, 5 (2017), 598–620.
  • Antypas et al. (2023) Dimosthenis Antypas, Alun Preece, and Jose Camacho-Collados. 2023. Negativity spreads faster: A large-scale multilingual twitter analysis on the role of sentiment in political communication. Online Social Networks and Media 33 (2023), 100242.
  • Aranda et al. (2021) Ana M Aranda, Kathrin Sele, Helen Etchanchu, Jonne Y Guyt, and Eero Vaara. 2021. From big data to rich theory: Integrating critical discourse analysis with structural topic modeling. European management review 18, 3 (2021), 197–214.
  • Barnes et al. (2018) Ralph M Barnes, Heather M Johnston, Noah MacKenzie, Stephanie J Tobin, and Chelsea M Taglang. 2018. The effect of ad hominem attacks on the evaluation of claims promoted by scientists. PLoS One 13, 1 (2018), e0192025.
  • Blei et al. (2003) David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.
  • Buchanan (2013) Karen S. Buchanan. 2013. Contested discourses, knowledge, and socio-environmental conflict in Ecuador. Environmental Science & Policy 30 (2013), 19–25. https://doi.org/10.1016/j.envsci.2012.12.012 SI: Environmental and Developmental Discourses: Technical knowledge, discursive spaces and politics.
  • Camacho-Collados et al. (2022) Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Martínez-Cámara, Gonzalo Medina, Thomas Buhrmann, Leonardo Neves, and Francesco Barbieri. 2022. TweetNLP: Cutting-Edge Natural Language Processing for Social Media. arXiv:2206.14774 [cs.CL]
  • Chatzakou et al. (2017) Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean Birds: Detecting Aggression and Bullying on Twitter. In Proceedings of the 2017 ACM on Web Science Conference (Troy, New York, USA) (WebSci ’17). Association for Computing Machinery, New York, NY, USA, 13–22. https://doi.org/10.1145/3091478.3091487
  • Chen et al. (2012) Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing. 71–80. https://doi.org/10.1109/SocialCom-PASSAT.2012.55
  • Chinn et al. (2020) Sedona Chinn, P Sol Hart, and Stuart Soroka. 2020. Politicization and polarization in climate change news content, 1985-2017. Science Communication 42, 1 (2020), 112–129.
  • Dunlap and McCright (2010) Riley E Dunlap and Aaron M McCright. 2010. Climate change denial: Sources, actors and strategies. In Routledge handbook of climate change and society. Routledge, 240–259.
  • Evkoski et al. (2021) Bojan Evkoski, Nikola Ljubešić, Andraž Pelicon, Igor Mozetič, and Petra Kralj Novak. 2021. Evolution of topics and hate speech in retweet network communities. Applied Network Science 6, 1 (2021), 1–20.
  • Fire and Guestrin (2019) Michael Fire and Carlos Guestrin. 2019. Over-optimization of academic publishing metrics: observing Goodhart’s Law in action. GigaScience 8, 6 (2019), giz053.
  • Fortuna and Nunes (2018) Paula Fortuna and Sérgio Nunes. 2018. A Survey on Automatic Detection of Hate Speech in Text. ACM Comput. Surv. 51, 4, Article 85 (jul 2018), 30 pages. https://doi.org/10.1145/3232676
  • Gauchat (2012) Gordon Gauchat. 2012. Politicization of science in the public sphere: A study of public trust in the United States, 1974 to 2010. American sociological review 77, 2 (2012), 167–187.
  • Guerini et al. (2012) Marco Guerini, Alberto Pepe, and Bruno Lepri. 2012. Do linguistic style and readability of scientific abstracts affect their virality?. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 6. 475–478.
  • Guerini et al. (2021) Marco Guerini, Carlo Strapparava, and Gozde Ozbal. 2021. Exploring Text Virality in Social Networks. Proceedings of the International AAAI Conference on Web and Social Media 5, 1 (Aug. 2021), 506–509. https://doi.org/10.1609/icwsm.v5i1.14169
  • Hasan et al. (2022) Rakibul Hasan, Cristobal Cheyre, Yong-Yeol Ahn, Roberto Hoyle, and Apu Kapadia. 2022. The Impact of Viral Posts on Visibility and Behavior of Professionals: A Longitudinal Study of Scientists on Twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 323–334.
  • Hmielowski et al. (2014) Jay D Hmielowski, Lauren Feldman, Teresa A Myers, Anthony Leiserowitz, and Edward Maibach. 2014. An attack on science? Media use, trust in scientists, and perceptions of global warming. Public Understanding of Science 23, 7 (2014), 866–883.
  • Hong and Davison (2010) Liangjie Hong and Brian D Davison. 2010. Empirical study of topic modeling in twitter. In Proceedings of the first workshop on social media analytics. 80–88.
  • Jacobs and Tschötschel (2019) Thomas Jacobs and Robin Tschötschel. 2019. Topic models meet discourse analysis: a quantitative tool for a qualitative approach. International Journal of Social Research Methodology 22, 5 (2019), 469–485.
  • Johnson and McLean (2020) Melissa N.P. Johnson and Ethan McLean. 2020. Discourse Analysis. In International Encyclopedia of Human Geography (Second Edition) (second edition ed.), Audrey Kobayashi (Ed.). Elsevier, Oxford, 377–383. https://doi.org/10.1016/B978-0-08-102295-5.10814-5
  • Kong et al. (2022) Quyu Kong, Emily Booth, Francesco Bailo, Amelia Johns, and Marian-Andrei Rizoiu. 2022. Slip** to the Extreme: A Mixed Method to Explain How Extreme Opinions Infiltrate Online Discussions. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 524–535.
  • König and Jucks (2019) Lars König and Regina Jucks. 2019. Hot topics in science communication: Aggressive language decreases trustworthiness and credibility in scientific debates. Public Understanding of Science 28, 4 (2019), 401–416.
  • Kullar et al. (2020) Ravina Kullar, Debra A Goff, Timothy P Gauthier, and Tara C Smith. 2020. To tweet or not to tweet—a review of the viral power of twitter for infectious diseases. Current Infectious Disease Reports 22 (2020), 1–6.
  • Luc et al. (2021) Jessica GY Luc, Michael A Archer, Rakesh C Arora, Edward M Bender, Arie Blitz, David T Cooke, Tamara Ni Hlci, Biniam Kidane, Maral Ouzounian, Thomas K Varghese Jr, et al. 2021. Does tweeting improve citations? One-year results from the TSSMN prospective randomized trial. The Annals of thoracic surgery 111, 1 (2021), 296–300.
  • Marsicek et al. (2018) Jeremiah Marsicek, Bryan N Shuman, Patrick J Bartlein, Sarah L Shafer, and Simon Brewer. 2018. Reconciling divergent trends and millennial variations in Holocene temperatures. Nature 554, 7690 (2018), 92–96.
  • Mosleh et al. (2021) Mohsen Mosleh, Cameron Martel, Dean Eckles, and David Rand. 2021. Perverse downstream consequences of debunking: Being corrected by another user for posting false political news increases subsequent sharing of low quality, partisan, and toxic content in a Twitter field experiment. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.
  • Naveed et al. (2011) Nasir Naveed, Thomas Gottron, Jérôme Kunegis, and Arifah Che Alhadi. 2011. Bad news travel fast: A content-based analysis of interestingness on twitter. In Proceedings of the 3rd international web science conference. 1–7.
  • Ortega (2016) José Luis Ortega. 2016. To be or not to be on Twitter, and its relationship with the tweeting and citation of research papers. Scientometrics 109 (2016), 1353–1364.
  • Pivecka et al. (2022) Niklas Pivecka, Roja Alexandra Ratzinger, and Arnd Florack. 2022. Emotions and virality: Social transmission of political messages on Twitter. Frontiers in Psychology 13 (2022).
  • Posner et al. (2005) Jonathan Posner, James A Russell, and Bradley S Peterson. 2005. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and psychopathology 17, 3 (2005), 715–734.
  • Priem et al. (2022) Jason Priem, Heather Piwowar, and Richard Orr. 2022. OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833 (2022).
  • Schmidt and Wiegand (2017) Anna Schmidt and Michael Wiegand. 2017. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, 1–10. https://doi.org/10.18653/v1/W17-1101
  • Silva et al. (2016) Leandro Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the Targets of Hate in Online Social Media. Proceedings of the International AAAI Conference on Web and Social Media 10 (03 2016). https://doi.org/10.1609/icwsm.v10i1.14811
  • Stieglitz and Dang-Xuan (2013) Stefan Stieglitz and Linh Dang-Xuan. 2013. Emotions and Information Diffusion in Social Media — Sentiment of Microblogs and Sharing Behavior. Journal of Management Information Systems 29 (04 2013), 217–248. https://doi.org/10.2753/MIS0742-1222290408
  • Suh et al. (2010) Bongwon Suh, Lichan Hong, Peter Pirolli, and Ed H Chi. 2010. Want to be retweeted? large scale analytics on factors impacting retweet in twitter network. In 2010 IEEE second international conference on social computing. IEEE, 177–184.
  • Sutton et al. (2015) Jeannette Sutton, C Gibson, Nolan Phillips, Emma Spiro, Cedar League, Britta Johnson, Sean Fitzhugh, and Carter Butts. 2015. A Cross-Hazard Analysis of Terse Message Retransmission on Twitter. Proceedings of the National Academy of Sciences 112 (11 2015). https://doi.org/10.1073/pnas.1508916112
  • Tsugawa and Ohsaki (2015) Sho Tsugawa and Hiroyuki Ohsaki. 2015. Negative Messages Spread Rapidly and Widely on Social Media. In Proceedings of the 2015 ACM on Conference on Online Social Networks (Palo Alto, California, USA) (COSN ’15). Association for Computing Machinery, New York, NY, USA, 151–160. https://doi.org/10.1145/2817946.2817962
  • Tsugawa and Ohsaki (2017) Sho Tsugawa and Hiroyuki Ohsaki. 2017. On the relation between message sentiment and its virality on social media. Social Network Analysis and Mining 7 (05 2017), 19. https://doi.org/10.1007/s13278-017-0439-0
  • Valero (2023) Myriam Vidal Valero. 2023. Thousands of scientists are cutting back on Twitter, seeding angst and uncertainty. Nature 620, 7974 (2023), 482–484.
  • Waseem et al. (2017) Zeerak Waseem, Thomas Davidson, Dana Warmsley, and Ingmar Weber. 2017. Understanding Abuse: A Typology of Abusive Language Detection Subtasks. In Proceedings of the First Workshop on Abusive Language Online. Association for Computational Linguistics, Vancouver, BC, Canada, 78–84. https://doi.org/10.18653/v1/W17-3012
  • Wu et al. (2020) ** Li, Yan Zhu, and Yishu Miao. 2020. Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 1772–1782. https://doi.org/10.18653/v1/2020.emnlp-main.138
  • Zafra et al. (2021) Salud María Zafra, Antonio Sáez-Castillo, Antonio Conde-Sánchez, and Maria Martín-Valdivia. 2021. How do sentiments affect virality on Twitter? Royal Society Open Science 8 (04 2021). https://doi.org/10.1098/rsos.201756
  • Zakhlebin and Horvát (2020) Igor Zakhlebin and Emoke-Agnes Horvát. 2020. Diffusion of scientific articles across online platforms. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 762–773.
  • Zampieri et al. (2019) Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 75–86. https://doi.org/10.18653/v1/S19-2010