Search | arXiv e-print repository

doi 10.1371/journal.pone.0265602

Retweet communities reveal the main sources of hate speech

Authors: Bojan Evkoski, Andraz Pelicon, Igor Mozetic, Nikola Ljubesic, Petra Kralj Novak

Abstract: We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to… ▽ More We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to three years of Slovenian Twitter data. We report a number of interesting results. Hate speech is dominated by offensive tweets, related to political and ideological issues. The share of unacceptable tweets is moderately increasing with time, from the initial 20% to 30% by the end of 2020. Unacceptable tweets are retweeted significantly more often than acceptable tweets. About 60% of unacceptable tweets are produced by a single right-wing community of only moderate size. Institutional Twitter accounts and media accounts post significantly less unacceptable tweets than individual accounts. In fact, the main sources of unacceptable tweets are anonymous accounts, and accounts that were suspended or closed during the years 2018-2020. △ Less

Submitted 17 March, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

Journal ref: B. Evkoski, A. Pelicon, I. Mozetič, N. Ljubešić, P. Kralj Novak. Retweet communities reveal the main sources of hate speech, PLoS ONE 17(3): e0265602, 2022

arXiv:2105.14005 [pdf, other]

Online Hate: Behavioural Dynamics and Relationship with Misinformation

Authors: Matteo Cinelli, Andraž Pelicon, Igor Mozetič, Walter Quattrociocchi, Petra Kralj Novak, Fabiana Zollo

Abstract: Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model fine-tuned on a lar… ▽ More Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model fine-tuned on a large set of hand-annotated data. Our analysis shows that there is no evidence of the presence of "serial haters", intended as active users posting exclusively hateful comments. Moreover, coherently with the echo chamber hypothesis, we find that users skewed towards one of the two categories of video channels (questionable, reliable) are more prone to use inappropriate, violent, or hateful language within their opponents community. Interestingly, users loyal to reliable sources use on average a more toxic language than their counterpart. Finally, we find that the overall toxicity of the discussion increases with its length, measured both in terms of number of comments and time. Our results show that, coherently with Godwin's law, online debates tend to degenerate towards increasingly toxic exchanges of views. △ Less

Submitted 28 May, 2021; originally announced May 2021.

arXiv:2105.06214 [pdf, other]

doi 10.1371/journal.pone.0256175

Community evolution in retweet networks

Authors: Bojan Evkoski, Igor Mozetic, Nikola Ljubesic, Petra Kralj Novak

Abstract: Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select several timepoints which exhibit large differences b… ▽ More Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select several timepoints which exhibit large differences between the communities. For community detection, we propose a two-stage approach. In the first stage, we apply an enhanced Louvain algorithm, called Ensemble Louvain, to find stable communities. In the second stage, we form influence links between these communities, and identify linked super-communities. For the detected communities, we compute internal and external influence, and for individual users, the retweet h-index influence. We apply the proposed approach to three years of Twitter data of all Slovenian tweets. The analysis shows that the Slovenian tweetosphere is dominated by politics, that the left-leaning communities are larger, but that the right-leaning communities and users exhibit significantly higher impact. An interesting observation is that retweet networks change relatively gradually, despite such events as the emergence of the Covid-19 pandemic or the change of government. △ Less

Submitted 2 September, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

Journal ref: PLoS ONE 16(9): e0256175, 2021

arXiv:1912.10795 [pdf]

(Mis)Information Operations: An Integrated Perspective

Authors: Matteo Cinelli, Mauro Conti, Livio Finos, Francesco Grisolia, Petra Kralj Novak, Antonio Peruzzi, Maurizio Tesconi, Fabiana Zollo, Walter Quattrociocchi

Abstract: The massive diffusion of social media fosters disintermediation and changes the way users are informed, the way they process reality, and the way they engage in public debate. The cognitive layer of users and the related social dynamics define the nature and the dimension of informational threats. Users show the tendency to interact with information adhering to their preferred narrative and to ign… ▽ More The massive diffusion of social media fosters disintermediation and changes the way users are informed, the way they process reality, and the way they engage in public debate. The cognitive layer of users and the related social dynamics define the nature and the dimension of informational threats. Users show the tendency to interact with information adhering to their preferred narrative and to ignore dissenting information. Confirmation bias seems to account for users decisions about consuming and spreading content; and, at the same time, aggregation of favored information within those communities reinforces group polarization. In this work, the authors address the problem of (mis)information operations with a holistic and integrated approach. Cognitive weakness induced by this new information environment are considered. Moreover, (mis)information operations, with particular reference to the Italian context, are considered; and the fact that the phenomenon is more complex than expected is highlighted. The paper concludes by providing an integrated research roadmap accounting for the possible future technological developments. △ Less

Submitted 23 December, 2019; originally announced December 2019.

Comments: The paper first appeared in Volume 18, Issue 3 of the Journal of Information Warfare

Journal ref: Journal of Information Warfare (2019) 18.2: 83-98

arXiv:1912.01072 [pdf, other]

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

Authors: Matej Martinc, Petra Kralj Novak, Senja Pollak

Abstract: We propose a new method that leverages contextual embeddings for the task of diachronic semantic shift detection by generating time specific word representations from BERT embeddings. The results of our experiments in the domain specific LiverpoolFC corpus suggest that the proposed method has performance comparable to the current state-of-the-art without requiring any time consuming domain adaptat… ▽ More We propose a new method that leverages contextual embeddings for the task of diachronic semantic shift detection by generating time specific word representations from BERT embeddings. The results of our experiments in the domain specific LiverpoolFC corpus suggest that the proposed method has performance comparable to the current state-of-the-art without requiring any time consuming domain adaptation on large corpora. The results on the newly created Brexit news corpus suggest that the method can be successfully used for the detection of a short-term yearly semantic shift. And lastly, the model also shows promising results in a multilingual settings, where the task was to detect differences and similarities between diachronic semantic shifts in different languages. △ Less

Submitted 5 March, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

Comments: Accepted to Language Resources and Evaluation (LREC 2020)

arXiv:1804.02233 [pdf, other]

Forex trading and Twitter: Spam, bots, and reputation manipulation

Authors: Igor Mozetič, Peter Gabrovšek, Petra Kralj Novak

Abstract: Currency trading (Forex) is the largest world market in terms of volume. We analyze trading and tweeting about the EUR-USD currency pair over a period of three years. First, a large number of tweets were manually labeled, and a Twitter stance classification model is constructed. The model then classifies all the tweets by the trading stance signal: buy, hold, or sell (EUR vs. USD). The Twitter sta… ▽ More Currency trading (Forex) is the largest world market in terms of volume. We analyze trading and tweeting about the EUR-USD currency pair over a period of three years. First, a large number of tweets were manually labeled, and a Twitter stance classification model is constructed. The model then classifies all the tweets by the trading stance signal: buy, hold, or sell (EUR vs. USD). The Twitter stance is compared to the actual currency rates by applying the event study methodology, well-known in financial economics. It turns out that there are large differences in Twitter stance distribution and potential trading returns between the four groups of Twitter users: trading robots, spammers, trading companies, and individual traders. Additionally, we observe attempts of reputation manipulation by post festum removal of tweets with poor predictions, and deleting/reposting of identical tweets to increase the visibility without tainting one's Twitter timeline. △ Less

Submitted 16 April, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

Comments: MIS2: Misinformation and Misbehavior Mining on the Web, Workshop at WSDM-18, Marina Del Rey, CA, USA, Feb. 9, 2018

arXiv:1509.07761 [pdf, ps, other]

doi 10.1371/journal.pone.0144296

Sentiment of Emojis

Authors: Petra Kralj Novak, Jasmina Smailović, Borut Sluban, Igor Mozetič

Abstract: There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of… ▽ More There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar. △ Less

Submitted 8 December, 2015; v1 submitted 25 September, 2015; originally announced September 2015.

Journal ref: PLoS ONE 10(12): e0144296, 2015

arXiv:1508.00027 [pdf, other]

Analysis of Financial News with NewsStream

Authors: Petra Kralj Novak, Miha Grcar, Borut Sluban, Igor Mozetic

Abstract: Unstructured data, such as news and blogs, can provide valuable insights into the financial world. We present the NewsStream portal, an intuitive and easy-to-use tool for news analytics, which supports interactive querying and visualizations of the documents at different levels of detail. It relies on a scalable architecture for real-time processing of a continuous stream of textual data, which in… ▽ More Unstructured data, such as news and blogs, can provide valuable insights into the financial world. We present the NewsStream portal, an intuitive and easy-to-use tool for news analytics, which supports interactive querying and visualizations of the documents at different levels of detail. It relies on a scalable architecture for real-time processing of a continuous stream of textual data, which incorporates data acquisition, cleaning, natural-language preprocessing and semantic annotation components. It has been running for over two years and collected over 18 million news articles and blog posts. The NewsStream portal can be used to answer the questions when, how often, in what context, and with what sentiment was a financial entity or term mentioned in a continuous stream of news and blogs, and therefore providing a complement to news aggregators. We illustrate some features of our system in three use cases: relations between the rating agencies and the PIIGS countries, reflection of financial news on credit default swap (CDS) prices, the emergence of the Bitcoin digital currency, and visualizing how the world is connected through news. △ Less

Submitted 7 November, 2015; v1 submitted 31 July, 2015; originally announced August 2015.

Report number: IJS-DP-11965

arXiv:1505.08001 [pdf, other]

doi 10.1371/journal.pone.0138740

Emotional Dynamics in the Age of Misinformation

Authors: Fabiana Zollo, Petra Kralj Novak, Michela Del Vicario, Alessandro Bessi, Igor Mozetic, Antonio Scala, Guido Caldarelli, Walter Quattrociocchi

Abstract: According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homophile communities (echo-chambers) around specific worldviews. Such a scenario has been shown to be a vivid environment for the… ▽ More According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homophile communities (echo-chambers) around specific worldviews. Such a scenario has been shown to be a vivid environment for the diffusion of false claims, in particular with respect to conspiracy theories. Not rarely, viral phenomena trigger naive (and funny) social responses -- e.g., the recent case of Jade Helm 15 where a simple military exercise turned out to be perceived as the beginning of the civil war in the US. In this work, we address the emotional dynamics of collective debates around distinct kind of news -- i.e., science and conspiracy news -- and inside and across their respective polarized communities (science and conspiracy news). Our findings show that comments on conspiracy posts tend to be more negative than on science posts. However, the more the engagement of users, the more they tend to negative commenting (both on science and conspiracy). Finally, zooming in at the interaction among polarized communities, we find a general negative pattern. As the number of comments increases -- i.e., the discussion becomes longer -- the sentiment of the post is more and more negative. △ Less

Submitted 29 May, 2015; originally announced May 2015.

Journal ref: PLoS ONE, 10(9): e0138740 (2015)

arXiv:1406.5323 [pdf, other]

doi 10.1371/journal.pone.0099515

Extraction of Temporal Networks from Term Co-occurrences in Online Textual Sources

Authors: Marko Popović, Hrvoje Štefančić, Borut Sluban, Petra Kralj Novak, Miha Grčar, Igor Mozetič, Michelangelo Puliga, Vinko Zlatić

Abstract: A stream of unstructured news can be a valuable source of hidden relations between different entities, such as financial institutions, countries, or persons. We present an approach to continuously collect online news, recognize relevant entities in them, and extract time-varying networks. The nodes of the network are the entities, and the links are their co-occurrences. We present a method to esti… ▽ More A stream of unstructured news can be a valuable source of hidden relations between different entities, such as financial institutions, countries, or persons. We present an approach to continuously collect online news, recognize relevant entities in them, and extract time-varying networks. The nodes of the network are the entities, and the links are their co-occurrences. We present a method to estimate the significance of co-occurrences, and a benchmark model against which their robustness is evaluated. The approach is applied to a large set of financial news, collected over a period of two years. The entities we consider are 50 countries which issue sovereign bonds, and which are insured by Credit Default Swaps (CDS) in turn. We compare the country co-occurrence networks to the CDS networks constructed from the correlations between the CDS. The results show relatively small, but significant overlap between the networks extracted from the news and those from the CDS correlations. △ Less

Submitted 20 June, 2014; originally announced June 2014.

Comments: 27 pages, 12 figures

Journal ref: PLoS ONE 9(12): e99515 (2014)

arXiv:1402.3483 [pdf, other]

doi 10.1038/srep05038

News Cohesiveness: an Indicator of Systemic Risk in Financial Markets

Authors: Matija Piškorec, Nino Antulov-Fantulin, Petra Kralj Novak, Igor Mozetič, Miha Grčar, Irena Vodenska, Tomislav Šmuc

Abstract: Motivated by recent financial crises significant research efforts have been put into studying contagion effects and herding behaviour in financial markets. Much less has been said about influence of financial news on financial markets. We propose a novel measure of collective behaviour in financial news on the Web, News Cohesiveness Index (NCI), and show that it can be used as a systemic risk indi… ▽ More Motivated by recent financial crises significant research efforts have been put into studying contagion effects and herding behaviour in financial markets. Much less has been said about influence of financial news on financial markets. We propose a novel measure of collective behaviour in financial news on the Web, News Cohesiveness Index (NCI), and show that it can be used as a systemic risk indicator. We evaluate the NCI on financial documents from large Web news sources on a daily basis from October 2011 to July 2013 and analyse the interplay between financial markets and financially related news. We hypothesized that strong cohesion in financial news reflects movements in the financial markets. Cohesiveness is more general and robust measure of systemic risk expressed in news, than measures based on simple occurrences of specific terms. Our results indicate that cohesiveness in the financial news is highly correlated with and driven by volatility on the financial markets. △ Less

Submitted 14 February, 2014; originally announced February 2014.

Journal ref: Scientific Reports 4: 5038 (2014)

Showing 1–11 of 11 results for author: Novak, P K