Search | arXiv e-print repository

doi 10.14746/amup.9788323241775

XAI in Computational Linguistics: Understanding Political Leanings in the Slovenian Parliament

Abstract: The work covers the development and explainability of machine learning models for predicting political leanings through parliamentary transcriptions. We concentrate on the Slovenian parliament and the heated debate on the European migrant crisis, with transcriptions from 2014 to 2020. We develop both classical machine learning and transformer language models to predict the left- or right-leaning o… ▽ More The work covers the development and explainability of machine learning models for predicting political leanings through parliamentary transcriptions. We concentrate on the Slovenian parliament and the heated debate on the European migrant crisis, with transcriptions from 2014 to 2020. We develop both classical machine learning and transformer language models to predict the left- or right-leaning of parliamentarians based on their given speeches on the topic of migrants. With both types of models showing great predictive success, we continue with explaining their decisions. Using explainability techniques, we identify keywords and phrases that have the strongest influence in predicting political leanings on the topic, with left-leaning parliamentarians using concepts such as people and unity and speak about refugees, and right-leaning parliamentarians using concepts such as nationality and focus more on illegal migrants. This research is an example that understanding the reasoning behind predictions can not just be beneficial for AI engineers to improve their models, but it can also be helpful as a tool in the qualitative analysis steps in interdisciplinary research. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: In 10th Language and Technology Conference (LTC 2023). April 2023

arXiv:2105.14898 [pdf, other]

doi 10.1371/journal.pone.0265602

Retweet communities reveal the main sources of hate speech

Authors: Bojan Evkoski, Andraz Pelicon, Igor Mozetic, Nikola Ljubesic, Petra Kralj Novak

Abstract: We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to… ▽ More We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to three years of Slovenian Twitter data. We report a number of interesting results. Hate speech is dominated by offensive tweets, related to political and ideological issues. The share of unacceptable tweets is moderately increasing with time, from the initial 20% to 30% by the end of 2020. Unacceptable tweets are retweeted significantly more often than acceptable tweets. About 60% of unacceptable tweets are produced by a single right-wing community of only moderate size. Institutional Twitter accounts and media accounts post significantly less unacceptable tweets than individual accounts. In fact, the main sources of unacceptable tweets are anonymous accounts, and accounts that were suspended or closed during the years 2018-2020. △ Less

Submitted 17 March, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

Journal ref: B. Evkoski, A. Pelicon, I. Mozetič, N. Ljubešić, P. Kralj Novak. Retweet communities reveal the main sources of hate speech, PLoS ONE 17(3): e0265602, 2022

arXiv:2105.06214 [pdf, other]

doi 10.1371/journal.pone.0256175

Community evolution in retweet networks

Authors: Bojan Evkoski, Igor Mozetic, Nikola Ljubesic, Petra Kralj Novak

Abstract: Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select several timepoints which exhibit large differences b… ▽ More Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select several timepoints which exhibit large differences between the communities. For community detection, we propose a two-stage approach. In the first stage, we apply an enhanced Louvain algorithm, called Ensemble Louvain, to find stable communities. In the second stage, we form influence links between these communities, and identify linked super-communities. For the detected communities, we compute internal and external influence, and for individual users, the retweet h-index influence. We apply the proposed approach to three years of Twitter data of all Slovenian tweets. The analysis shows that the Slovenian tweetosphere is dominated by politics, that the left-leaning communities are larger, but that the right-leaning communities and users exhibit significantly higher impact. An interesting observation is that retweet networks change relatively gradually, despite such events as the emergence of the Covid-19 pandemic or the change of government. △ Less

Submitted 2 September, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

Journal ref: PLoS ONE 16(9): e0256175, 2021

arXiv:1911.11750 [pdf, other]

A Measure of Similarity in Textual Data Using Spearman's Rank Correlation Coefficient

Authors: Nino Arsov, Milan Dukovski, Blagoja Evkoski, Stefan Cvetkovski

Abstract: In the last decade, many diverse advances have occurred in the field of information extraction from data. Information extraction in its simplest form takes place in computing environments, where structured data can be extracted through a series of queries. The continuous expansion of quantities of data have therefore provided an opportunity for knowledge extraction (KE) from a textual document (TD… ▽ More In the last decade, many diverse advances have occurred in the field of information extraction from data. Information extraction in its simplest form takes place in computing environments, where structured data can be extracted through a series of queries. The continuous expansion of quantities of data have therefore provided an opportunity for knowledge extraction (KE) from a textual document (TD). A typical problem of this kind is the extraction of common characteristics and knowledge from a group of TDs, with the possibility to group such similar TDs in a process known as clustering. In this paper we present a technique for such KE among a group of TDs related to the common characteristics and meaning of their content. Our technique is based on the Spearman's Rank Correlation Coefficient (SRCC), for which the conducted experiments have proven to be comprehensive measure to achieve a high-quality KE. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Showing 1–4 of 4 results for author: Evkoski, B