Search | arXiv e-print repository

AI-enhanced Collective Intelligence: The State of the Art and Prospects

Abstract: The current societal challenges exceed the capacity of human individual or collective effort alone. As AI evolves, its role within human collectives is poised to vary from an assistive tool to a participatory member. Humans and AI possess complementary capabilities that, when synergized, can achieve a level of collective intelligence that surpasses the collective capabilities of either humans or A… ▽ More The current societal challenges exceed the capacity of human individual or collective effort alone. As AI evolves, its role within human collectives is poised to vary from an assistive tool to a participatory member. Humans and AI possess complementary capabilities that, when synergized, can achieve a level of collective intelligence that surpasses the collective capabilities of either humans or AI in isolation. However, the interactions in human-AI systems are inherently complex, involving intricate processes and interdependencies. This review incorporates perspectives from network science to conceptualize a multilayer representation of human-AI collective intelligence, comprising a cognition layer, a physical layer, and an information layer. Within this multilayer network, humans and AI agents exhibit varying characteristics; humans differ in diversity from surface-level to deep-level attributes, while AI agents range in degrees of functionality and anthropomorphism. The interplay among these agents shapes the overall structure and dynamics of the system. We explore how agents' diversity and interactions influence the system's collective intelligence. Furthermore, we present an analysis of real-world instances of AI-enhanced collective intelligence. We conclude by addressing the potential challenges in AI-enhanced collective intelligence and offer perspectives on future developments in this field. △ Less

Submitted 19 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 27 pages, 2 figures

arXiv:2402.14410 [pdf, other]

Human-machine social systems

Authors: Milena Tsvetkova, Taha Yasseri, Niccolo Pescetelli, Tobias Werner

Abstract: From fake accounts on social media and generative-AI bots such as ChatGPT to high-frequency trading algorithms on financial markets and self-driving vehicles on the streets, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and… ▽ More From fake accounts on social media and generative-AI bots such as ChatGPT to high-frequency trading algorithms on financial markets and self-driving vehicles on the streets, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and autonomous machines constitute complex adaptive social systems where the collective outcomes cannot be simply deduced from either human or machine behavior alone. Under this paradigm, we review recent experimental, theoretical, and observational research from across a range of disciplines - robotics, human-computer interaction, web science, complexity science, computational social science, finance, economics, political science, social psychology, and sociology. We identify general dynamics and patterns in situations of competition, coordination, cooperation, contagion, and collective decision-making, and contextualize them in four prominent existing human-machine communities: high-frequency trading markets, the social media platform formerly known as Twitter, the open-collaboration encyclopedia Wikipedia, and the news aggregation and discussion community Reddit. We conclude with suggestions for the research, design, and governance of human-machine social systems, which are necessary to reduce misinformation, prevent financial crashes, improve road safety, overcome labor market disruptions, and enable a better human future. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 44 pages, 2 figures

ACM Class: A.1; C.2.4; H.1.2; J.4; K.4.0; K.6.0

arXiv:2303.12759 [pdf, other]

Using word embeddings to analyse audience effects and individual differences in parenting Subreddits

Authors: Melody Sepahpour-Fard, Michael Quayle, Maria Schuld, Taha Yasseri

Abstract: Human beings adapt their language to the audience they interact with. To study the impact of audience and gender in a natural setting, we choose a domain where gender plays a particularly salient role: parenting. We collect posts from the three popular parenting Subreddits (i.e., topical communities on Reddit) r/Daddit, r/Mommit, and r/Parenting. These three Subreddits gather different audiences,… ▽ More Human beings adapt their language to the audience they interact with. To study the impact of audience and gender in a natural setting, we choose a domain where gender plays a particularly salient role: parenting. We collect posts from the three popular parenting Subreddits (i.e., topical communities on Reddit) r/Daddit, r/Mommit, and r/Parenting. These three Subreddits gather different audiences, respectively, self-identifying as fathers and mothers (ostensibly single-gender), and parents (explicitly mixed-gender). By selecting a sample of users who have published on both a single-gender and a mixed-gender Subreddit, we are able to explore both audience and gender effects. We analyse posts with word embeddings by adding the username as a token in the corpus. This way, we are able to compare user-tokens to word-tokens and measure their similarity. We also investigate individual differences in this context by comparing users who exhibit significant changes in their behaviour (high self-monitors) with those who show less variation (low self-monitors). Results show that r/Parenting users generally discuss a great diversity of topics while fathers focus more on advising others on educational and family matters. Mothers in r/Mommit distinguish themselves from other groups by primarily discussing topics such as medical care, sleep and potty training, and food. Both mothers and fathers celebrate parenting events and describe or comment on the physical appearance of their children with a single-gender audience. In terms of individual differences, we find that, especially on r/Parenting, high self-monitors tend to conform more to the norms of the Subreddit by discussing more of the topics associated with the Subreddit. In conclusion, this study shows how mothers and fathers express different concerns and change their behaviour for different group-based audiences. △ Less

Submitted 1 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: 17 pages, 7 figures

arXiv:2303.10036 [pdf, other]

Individual differences in knowledge network navigation

Authors: Manran Zhu, Taha Yasseri, János Kertész

Abstract: With the rapid accumulation of online information, efficient web navigation has grown vital yet challenging. To create an easily navigable cyberspace catering to diverse demographics, understanding how people navigate differently is paramount. While previous research has unveiled individual differences in spatial navigation, such differences in knowledge space navigation remain sparse. To bridge t… ▽ More With the rapid accumulation of online information, efficient web navigation has grown vital yet challenging. To create an easily navigable cyberspace catering to diverse demographics, understanding how people navigate differently is paramount. While previous research has unveiled individual differences in spatial navigation, such differences in knowledge space navigation remain sparse. To bridge this gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed personal information questionnaires. Our analysis shows that age negatively affects knowledge space navigation performance, while multilingualism enhances it. Under time pressure, participants' performance improves across trials and males outperform females, an effect not observed in games without time pressure. In our experiment, successful route-finding is usually not related to abilities of innovative exploration of routes. Our results underline the importance of age, multilingualism and time constraint in the knowledge space navigation. △ Less

Submitted 19 March, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: 14 pages, 4 figures

arXiv:2211.07616 [pdf, other]

Between News and History: Identifying Networked Topics of Collective Attention on Wikipedia

Authors: Patrick Gildersleve, Renaud Lambiotte, Taha Yasseri

Abstract: The digital information landscape has introduced a new dimension to understanding how we collectively react to new information and preserve it at the societal level. This, together with the emergence of platforms such as Wikipedia, has challenged traditional views on the relationship between current events and historical accounts of events, with an ever-shrinking divide between "news" and "history… ▽ More The digital information landscape has introduced a new dimension to understanding how we collectively react to new information and preserve it at the societal level. This, together with the emergence of platforms such as Wikipedia, has challenged traditional views on the relationship between current events and historical accounts of events, with an ever-shrinking divide between "news" and "history". Wikipedia's place as the Internet's primary reference work thus poses the question of how it represents both traditional encyclopaedic knowledge and evolving important news stories. In other words, how is information on and attention towards current events integrated into the existing topical structures of Wikipedia? To address this we develop a temporal community detection approach towards topic detection that takes into account both short term dynamics of attention as well as long term article network structures. We apply this method to a dataset of one year of current events on Wikipedia to identify clusters distinct from those that would be found solely from page view time series correlations or static network structure. We are able to resolve the topics that more strongly reflect unfolding current events vs more established knowledge by the relative importance of collective attention dynamics vs link structures. We also offer important developments by identifying and describing the emergent topics on Wikipedia. This work provides a means of distinguishing how these information and attention clusters are related to Wikipedia's twin faces of encyclopaedic knowledge and current events -- crucial to understanding the production and consumption of knowledge in the digital age. △ Less

Submitted 12 May, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2207.01352 [pdf, other]

doi 10.1038/s41598-023-39035-3

Terrorist attacks sharpen the binary perception of "Us" vs. "Them"

Authors: Milan Jović, Lovro Šubelj, Tea Golob, Matej Makarovič, Taha Yasseri, Danijela Boberić Krstićev, Srdjan Škrbić, Zoran Levnajić

Abstract: Terrorist attacks not only harm citizens but also shift their attention, which has long-lasting impacts on public opinion and government policies. Yet measuring the changes in public attention beyond media coverage has been methodologically challenging. Here we approach this problem by starting from Wikipedia's répertoire of 5.8 million articles and a sample of 15 recent terrorist attacks. We depl… ▽ More Terrorist attacks not only harm citizens but also shift their attention, which has long-lasting impacts on public opinion and government policies. Yet measuring the changes in public attention beyond media coverage has been methodologically challenging. Here we approach this problem by starting from Wikipedia's répertoire of 5.8 million articles and a sample of 15 recent terrorist attacks. We deploy a complex exclusion procedure to identify topics and themes that consistently received a significant increase in attention due to these incidents. Examining their contents reveals a clear picture: terrorist attacks foster establishing a sharp boundary between "Us" (the target society) and "Them" (the terrorist as the enemy). In the midst of this, one seeks to construct identities of both sides. This triggers curiosity to learn more about "Them" and soul-search for a clearer understanding of "Us". This systematic analysis of public reactions to disruptive events could help mitigate their societal consequences. △ Less

Submitted 3 August, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Peer-reviewed; Published

Journal ref: Sci Rep 13, 12451 (2023)

arXiv:2207.01042 [pdf]

doi 10.1016/bs.pbr.2022.07.001

Collective Memory in the Digital Age

Authors: Taha Yasseri, Patrick Gildersleve, Lea David

Abstract: The digital transformation of our societies and in particular information and communication technologies have revolutionized how we generate, communicate, and acquire information. Collective memory as a core and unifying force in our societies has not been an exception among many societal concepts which have been revolutionized through digital transformation. In this chapter, we have distinguished… ▽ More The digital transformation of our societies and in particular information and communication technologies have revolutionized how we generate, communicate, and acquire information. Collective memory as a core and unifying force in our societies has not been an exception among many societal concepts which have been revolutionized through digital transformation. In this chapter, we have distinguished between "the digitalized collective memory" and "collective memory in the digital age". In addition to discussing these two main concepts, we discuss how digital tools and trace data can open doorways into the study of collective memory that is formed inside and outside of the digital space. △ Less

Submitted 3 July, 2022; originally announced July 2022.

Comments: This is a preprint of a Chapter to appear in "Collective Memory" Edited by Shane O'Mara and to be published by Elsevier in 2022. Please cite as: Yasseri, T., Gildersleve, P., and David, L. (2022), Collective Memory in the Digital Age, In S. O'Mara (Ed.), Collective Memory, Elsevier

Journal ref: Progress in Brain Research 274-1, pp 203-226 (2022)

arXiv:2104.13754 [pdf]

doi 10.1145/3578645

Can crowdsourcing rescue the social marketplace of ideas?

Authors: Taha Yasseri, Filippo Menczer

Abstract: Facebook and Twitter recently announced community-based review platforms to address misinformation. We provide an overview of the potential affordances of such community-based approaches to content moderation based on past research and preliminary analysis of Twitter's Birdwatch data. While our analysis generally supports a community-based approach to content moderation, it also warns against pote… ▽ More Facebook and Twitter recently announced community-based review platforms to address misinformation. We provide an overview of the potential affordances of such community-based approaches to content moderation based on past research and preliminary analysis of Twitter's Birdwatch data. While our analysis generally supports a community-based approach to content moderation, it also warns against potential pitfalls, particularly when the implementation of the new infrastructure focuses on crowd-based "validation" rather than "collaboration." We call for multidisciplinary research utilizing methods from complex systems studies, behavioural sociology, and computational social science to advance the research on crowd-based content moderation. △ Less

Submitted 19 December, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: In Press in Communications of the ACM (CACM)

Journal ref: Communications of the ACM (2023)

arXiv:2104.04074 [pdf, other]

The Kaleidoscope of Privacy: Differences across French, German, UK, and US GDPR Media Discourse

Authors: Mary Sanford, Taha Yasseri

Abstract: Conceptions of privacy differ by culture. In the Internet age, digital tools continuously challenge the way users, technologists, and governments define, value, and protect privacy. National and supranational entities attempt to regulate privacy and protect data managed online. The European Union passed the General Data Protection Regulation (GDPR), which took effect on 25 May 2018. The research p… ▽ More Conceptions of privacy differ by culture. In the Internet age, digital tools continuously challenge the way users, technologists, and governments define, value, and protect privacy. National and supranational entities attempt to regulate privacy and protect data managed online. The European Union passed the General Data Protection Regulation (GDPR), which took effect on 25 May 2018. The research presented here draws on two years of media reporting on GDPR from French, German, UK, and US sources. We use the unsupervised machine learning method of topic modelling to compare the thematic structure of the news articles across time and geographic regions. Our work emphasises the relevance of regional differences regarding valuations of privacy and potential obstacles to the implementation of unilateral data protection regulation such as GDPR. We find that the topics and trends over time in GDPR media coverage of the four countries reflect the differences found across their traditional privacy cultures. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Comments: Under Review

arXiv:2101.02695 [pdf]

doi 10.3389/fphy.2021.650720

Gender Imbalance and Spatiotemporal Patterns of Contributions to Citizen Science Projects: the case of Zooniverse

Authors: Khairunnisa Ibrahim, Samuel Khodursky, Taha Yasseri

Abstract: Citizen Science is research undertaken by professional scientists and members of the public collaboratively. Despite numerous benefits of citizen science for both the advancement of science and the community of the citizen scientists, there is still no comprehensive knowledge of patterns of contributions, and the demography of contributors to citizen science projects. In this paper we provide a fi… ▽ More Citizen Science is research undertaken by professional scientists and members of the public collaboratively. Despite numerous benefits of citizen science for both the advancement of science and the community of the citizen scientists, there is still no comprehensive knowledge of patterns of contributions, and the demography of contributors to citizen science projects. In this paper we provide a first overview of spatiotemporal and gender distribution of citizen science workforce by analyzing 54 million classifications contributed by more than 340 thousand citizen science volunteers from 198 countries to one of the largest citizen science platforms, Zooniverse. First we report on the uneven geographical distribution of the citizen scientist and model the variations among countries based on the socio-economic conditions as well as the level of research investment in each country. Analyzing the temporal features of contributions, we report on high "burstiness" of participation instances as well as the leisurely nature of participation suggested by the time of the day that the citizen scientists were the most active. Finally, we discuss the gender imbalance among citizen scientists (about 30% female) and compare it with other collaborative projects as well as the gender distribution in more formal scientific activities. Citizen science projects need further attention from outside of the academic community, and our findings can help attract the attention of public and private stakeholders, as well as to inform the design of the platforms and science policy making processes. △ Less

Submitted 7 January, 2021; originally announced January 2021.

Comments: Under Review

Journal ref: Front. Phys. 9:650720 (2021)

arXiv:2101.01270 [pdf]

doi 10.1007/s12144-022-02717-8

What drives passion? An empirical examination on the impact of personality trait interactions and job environments on work passion

Authors: Annika Breu, Taha Yasseri

Abstract: Passionate employees are essential for organisational success as they foster higher performance and exhibit lower turnover or absenteeism. While a large body of research has investigated the consequences of passion, we know only little about its antecedents. Integrating trait interaction theory with trait activation theory, this paper examines how personality traits, i.e. conscientiousness, agreea… ▽ More Passionate employees are essential for organisational success as they foster higher performance and exhibit lower turnover or absenteeism. While a large body of research has investigated the consequences of passion, we know only little about its antecedents. Integrating trait interaction theory with trait activation theory, this paper examines how personality traits, i.e. conscientiousness, agreeableness, and neuroticism impact passion at work across different job situations. Passion has been conceptualized as a two-dimensional construct, consisting of harmonious work passion (HWP) and obsessive work passion (OWP). Our study is based on a sample of N = 824 participants from the myPersonality project. We find a positive relationship between neuroticism and OWP in enterprising environments. Further, we find a three-way interaction between conscientiousness, agreeableness, and enterprising environment in predicting OWP. Our findings imply that the impact of personality configurations on different forms of passion is contingent on the job environment. Moreover, in line with self-regulation theory, the results reveal agreeableness as a "cool influencer" and neuroticism as a "hot influencer" of the relationship between conscientiousness and work passion. We derive practical implications for organisations on how to foster work passion, particularly HWP, in organisations. △ Less

Submitted 4 January, 2022; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: To Appear in Current Psychology

Journal ref: Curr Psychol (2022)

arXiv:2101.00296 [pdf]

Tweeting for the Cause: Network analysis of UK petition sharing

Authors: Peter Cihon, Taha Yasseri, Scott Hale, Helen Margetts

Abstract: Online government petitions represent a new data-rich mode of political participation. This work examines the thus far understudied dynamics of sharing petitions on social media in order to garner signatures and, ultimately, a government response. Using 20 months of Twitter data comprising over 1 million tweets linking to a petition, we perform analyses of networks constructed of petitions and sup… ▽ More Online government petitions represent a new data-rich mode of political participation. This work examines the thus far understudied dynamics of sharing petitions on social media in order to garner signatures and, ultimately, a government response. Using 20 months of Twitter data comprising over 1 million tweets linking to a petition, we perform analyses of networks constructed of petitions and supporters on Twitter, revealing implicit social dynamics therein. We find that Twitter users do not exclusively share petitions on one issue nor do they share exclusively popular petitions. Among the over 240,000 Twitter users, we find latent support groups, with the most central users primarily being politically active "average" individuals. Twitter as a platform for sharing government petitions, thus, appears to hold potential to foster the creation of and coordination among a new form of latent support interest groups online. △ Less

Submitted 1 January, 2021; originally announced January 2021.

Comments: Presented at IPP2016 Conference, Oxford, UK. http://blogs.oii.ox.ac.uk/ipp-conference/2016.html

arXiv:2009.11038 [pdf, other]

The cost of coordination can exceed the benefit of collaboration in performing complex tasks

Authors: Vince J. Straub, Milena Tsvetkova, Taha Yasseri

Abstract: Humans and other intelligent agents often rely on collective decision making based on an intuition that groups outperform individuals. However, at present, we lack a complete theoretical understanding of when groups perform better. Here we examine performance in collective decision-making in the context of a real-world citizen science task environment in which individuals with manipulated differen… ▽ More Humans and other intelligent agents often rely on collective decision making based on an intuition that groups outperform individuals. However, at present, we lack a complete theoretical understanding of when groups perform better. Here we examine performance in collective decision-making in the context of a real-world citizen science task environment in which individuals with manipulated differences in task-relevant training collaborated. We find 1) dyads gradually improve in performance but do not experience a collective benefit compared to individuals in most situations; 2) the cost of coordination to efficiency and speed that results when switching to a dyadic context after training individually is consistently larger than the leverage of having a partner, even if they are expertly trained in that task; and 3) on the most complex tasks having an additional expert in the dyad who is adequately trained improves accuracy. These findings highlight that the extent of training received by an individual, the complexity of the task at hand, and the desired performance indicator are all critical factors that need to be accounted for when weighing up the benefits of collective decision-making. △ Less

Submitted 27 January, 2023; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: in Press in Collective Intelligence. Please cite the published version using the DOI below

arXiv:2006.15648 [pdf]

doi 10.1080/13691058.2021.1901145

Selling sex: what determines rates and popularity? An analysis of 11,500 online profiles

Authors: Alicia Mergenthaler, Taha Yasseri

Abstract: Sex work, or the exchange of sexual services for money or goods, is ubiquitous across eras and cultures. However, the practice of selling sex is often hidden due to stigma and the varying legal status of sex work. Online platforms that sex workers use to advertise services have become an increasingly important means of studying a market that is largely hidden. Although prior literature has primari… ▽ More Sex work, or the exchange of sexual services for money or goods, is ubiquitous across eras and cultures. However, the practice of selling sex is often hidden due to stigma and the varying legal status of sex work. Online platforms that sex workers use to advertise services have become an increasingly important means of studying a market that is largely hidden. Although prior literature has primarily shed light on sex work from a public health or policy perspective (focusing largely on female sex workers), there are few studies that empirically research patterns of service provision in online sex work. This study investigated the determinants of pricing and popularity in the market for commercial sexual services online by using data from the largest UK network of online sexual services, a platform that is the industry-standard for sex workers. While the size of these influences varies across genders, nationality, age and the services provided are shown to be primary drivers of rates and popularity in sex work. △ Less

Submitted 11 May, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

Comments: Main manuscript and Supplementary Information

Journal ref: Culture, Health & Sexuality, 24:7, 935-952 (2022)

arXiv:2001.02878 [pdf]

doi 10.1080/0022250X.2020.1818078

Positive algorithmic bias cannot stop fragmentation in homophilic networks

Authors: Chris Blex, Taha Yasseri

Abstract: Fragmentation, echo chambers, and their amelioration in social networks have been a growing concern in the academic and non-academic world. This paper shows how, under the assumption of homophily, echo chambers and fragmentation are system-immanent phenomena of highly flexible social networks, even under ideal conditions for heterogeneity. We achieve this by finding an analytical, network-based so… ▽ More Fragmentation, echo chambers, and their amelioration in social networks have been a growing concern in the academic and non-academic world. This paper shows how, under the assumption of homophily, echo chambers and fragmentation are system-immanent phenomena of highly flexible social networks, even under ideal conditions for heterogeneity. We achieve this by finding an analytical, network-based solution to the Schelling model and by proving that weak ties do not hinder the process. Furthermore, we derive that no level of positive algorithmic bias in the form of rewiring is capable of preventing fragmentation and its effect on reducing the fragmentation speed is negligible. △ Less

Submitted 9 September, 2021; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: Cite as: Chris Blex & Taha Yasseri (2020) Positive algorithmic bias cannot stop fragmentation in homophilic networks, The Journal of Mathematical Sociology, DOI: 10.1080/0022250X.2020.1818078

Journal ref: The Journal of Mathematical Sociology, 46:1, 80-97 (2022)

arXiv:1911.12275 [pdf]

doi 10.1007/s42001-021-00158-0

Fooling with facts: Quantifying anchoring bias through a large-scale online experiment

Authors: Taha Yasseri, Jannie Reher

Abstract: Living in the 'Information Age' means that not only access to information has become easier but also that the distribution of information is more dynamic than ever. Through a large-scale online field experiment, we provide new empirical evidence for the presence of the anchoring bias in people's judgment due to irrational reliance on a piece of information that they are initially given. The compar… ▽ More Living in the 'Information Age' means that not only access to information has become easier but also that the distribution of information is more dynamic than ever. Through a large-scale online field experiment, we provide new empirical evidence for the presence of the anchoring bias in people's judgment due to irrational reliance on a piece of information that they are initially given. The comparison of the anchoring stimuli and respective responses across different tasks reveals a positive, yet complex relationship between the anchors and the bias in participants' predictions of the outcomes of events in the future. Participants in the treatment group were equally susceptible to the anchors regardless of their level of engagement, previous performance, or gender. Given the strong and ubiquitous influence of anchors quantified here, we should take great care to closely monitor and regulate the distribution of information online to facilitate less biased decision making. △ Less

Submitted 27 November, 2019; originally announced November 2019.

Comments: Under Review, 15 pages + Supplementary Information

Journal ref: J Comput Soc Sc 5, 1001-1021 (2022)

arXiv:1910.05794 [pdf]

doi 10.1080/18335330.2021.1892166

Islamophobes are not all the same! A study of far right actors on Twitter

Authors: Bertie Vidgen, Taha Yasseri, Helen Margetts

Abstract: Far-right actors are often purveyors of Islamophobic hate speech online, using social media to spread divisive and prejudiced messages which can stir up intergroup tensions and conflict. Hateful content can inflict harm on targeted victims, create a sense of fear amongst communities and stir up intergroup tensions and conflict. Accordingly, there is a pressing need to better understand at a granul… ▽ More Far-right actors are often purveyors of Islamophobic hate speech online, using social media to spread divisive and prejudiced messages which can stir up intergroup tensions and conflict. Hateful content can inflict harm on targeted victims, create a sense of fear amongst communities and stir up intergroup tensions and conflict. Accordingly, there is a pressing need to better understand at a granular level how Islamophobia manifests online and who produces it. We investigate the dynamics of Islamophobia amongst followers of a prominent UK far right political party on Twitter, the British National Party. Analysing a new data set of five million tweets, collected over a period of one year, using a machine learning classifier and latent Markov modelling, we identify seven types of Islamophobic far right actors, capturing qualitative, quantitative and temporal differences in their behaviour. Notably, we show that a small number of users are responsible for most of the Islamophobia that we observe. We then discuss the policy implications of this typology in the context of social media regulation. △ Less

Submitted 8 March, 2021; v1 submitted 13 October, 2019; originally announced October 2019.

Journal ref: Journal of Policing, Intelligence and Counter Terrorism, 17:1, 1-23 (2022)

arXiv:1908.08991 [pdf, other]

doi 10.1098/rsos.210617

Football is becoming more predictable; Network analysis of 88 thousands matches in 11 major leagues

Authors: Victor Martins Maimone, Taha Yasseri

Abstract: In recent years excessive monetization of football and professionalism among the players has been argued to have affected the quality of the match in different ways. On the one hand, playing football has become a high-income profession and the players are highly motivated; on the other hand, stronger teams have higher incomes and therefore afford better players leading to an even stronger appearan… ▽ More In recent years excessive monetization of football and professionalism among the players has been argued to have affected the quality of the match in different ways. On the one hand, playing football has become a high-income profession and the players are highly motivated; on the other hand, stronger teams have higher incomes and therefore afford better players leading to an even stronger appearance in tournaments that can make the game more imbalanced and hence predictable. To quantify and document this observation, in this work we take a minimalist network science approach to measure the predictability of football over 26 years in major European leagues. We show that over time, the games in major leagues have indeed become more predictable. We provide further support for this observation by showing that inequality between teams has increased and the home-field advantage has been vanishing ubiquitously. We do not include any direct analysis on the effects of monetization on football's predictability or therefore, lack of excitement, however, we propose several hypotheses which could be tested in future analyses. △ Less

Submitted 6 July, 2022; v1 submitted 23 August, 2019; originally announced August 2019.

Comments: revised version - before publication in Royal Society Open Sciecne

Journal ref: Royal Society Open Science, 8(12), 210617 (2021)

arXiv:1908.08859 [pdf, other]

doi 10.1007/s41109-021-00379-2

Dissent and Rebellion in the House of Commons: A Social Network Analysis of Brexit-Related Divisions in the 57$^{ th}$ Parliament

Authors: Carla Intal, Taha Yasseri

Abstract: The British party system is known for its discipline and cohesion, but it remains wedged on one issue: European integration. We offer a methodology using social network analysis that considers the individual interactions of MPs in the voting process. Using public Parliamentary records, we scraped votes of individual MPs in the 57th parliament (June 2017 to April 2019), computed pairwise similarity… ▽ More The British party system is known for its discipline and cohesion, but it remains wedged on one issue: European integration. We offer a methodology using social network analysis that considers the individual interactions of MPs in the voting process. Using public Parliamentary records, we scraped votes of individual MPs in the 57th parliament (June 2017 to April 2019), computed pairwise similarity scores and calculated rebellion metrics based on eigenvector centralities. Comparing the networks of Brexit- and non-Brexit divisions, our methodology was able to detect a significant difference in eurosceptic behaviour for the former, and using a rebellion metric we predicted how MPs would vote in a forthcoming Brexit deal with over 90% accuracy. △ Less

Submitted 27 May, 2021; v1 submitted 23 August, 2019; originally announced August 2019.

Comments: Published

Journal ref: Appl Netw Sci 6, 36 (2021)

arXiv:1907.01536 [pdf]

doi 10.1007/s11077-020-09395-y

What, When and Where of petitions submitted to the UK Government during a time of chaos

Authors: Bertie Vidgen, Taha Yasseri

Abstract: In times marked by political turbulence and uncertainty, as well as increasing divisiveness and hyperpartisanship, Governments need to use every tool at their disposal to understand and respond to the concerns of their citizens. We study issues raised by the UK public to the Government during 2015-2017 (surrounding the UK EU-membership referendum), mining public opinion from a dataset of 10,950 pe… ▽ More In times marked by political turbulence and uncertainty, as well as increasing divisiveness and hyperpartisanship, Governments need to use every tool at their disposal to understand and respond to the concerns of their citizens. We study issues raised by the UK public to the Government during 2015-2017 (surrounding the UK EU-membership referendum), mining public opinion from a dataset of 10,950 petitions (representing 30.5 million signatures). We extract the main issues with a ground-up natural language processing (NLP) method, latent Dirichlet allocation (LDA). We then investigate their temporal dynamics and geographic features. We show that whilst the popularity of some issues is stable across the two years, others are highly influenced by external events, such as the referendum in June 2016. We also study the relationship between petitions' issues and where their signatories are geographically located. We show that some issues receive support from across the whole country but others are far more local. We then identify six distinct clusters of constituencies based on the issues which constituents sign. Finally, we validate our approach by comparing the petitions' issues with the top issues reported in Ipsos MORI survey data. These results show the huge power of computationally analyzing petitions to understand not only what issues citizens are concerned about but also when and from where. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: Preprint; under review

Journal ref: Policy Sci 53, 535-557 (2020)

arXiv:1904.06310 [pdf, other]

Female scholars need to achieve more for equal public recognition

Authors: Menno H. Schellekens, Floris Holstege, Taha Yasseri

Abstract: Different kinds of "gender gap" have been reported in different walks of the scientific life, almost always favouring male scientists over females. In this work, for the first time, we present a large-scale empirical analysis to ask whether female scientists with the same level of scientific accomplishment are as likely as males to be recognised. We particularly focus on Wikipedia, the open online… ▽ More Different kinds of "gender gap" have been reported in different walks of the scientific life, almost always favouring male scientists over females. In this work, for the first time, we present a large-scale empirical analysis to ask whether female scientists with the same level of scientific accomplishment are as likely as males to be recognised. We particularly focus on Wikipedia, the open online encyclopedia that its open nature allows us to have a proxy of community recognition. We calculate the probability of appearing on Wikipedia as a scientist for both male and female scholars in three different fields. We find that women in Physics, Economics and Philosophy are considerable less likely than men to be recognised on Wikipedia across all levels of achievement. △ Less

Submitted 16 April, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

Comments: Under review

arXiv:1812.10400 [pdf]

Detecting weak and strong Islamophobic hate speech on social media

Authors: Bertie Vidgen, Taha Yasseri

Abstract: Islamophobic hate speech on social media inflicts considerable harm on both targeted individuals and wider society, and also risks reputational damage for the host platforms. Accordingly, there is a pressing need for robust tools to detect and classify Islamophobic hate speech at scale. Previous research has largely approached the detection of Islamophobic hate speech on social media as a binary t… ▽ More Islamophobic hate speech on social media inflicts considerable harm on both targeted individuals and wider society, and also risks reputational damage for the host platforms. Accordingly, there is a pressing need for robust tools to detect and classify Islamophobic hate speech at scale. Previous research has largely approached the detection of Islamophobic hate speech on social media as a binary task. However, the varied nature of Islamophobia means that this is often inappropriate for both theoretically-informed social science and effectively monitoring social media. Drawing on in-depth conceptual work we build a multi-class classifier which distinguishes between non-Islamophobic, weak Islamophobic and strong Islamophobic content. Accuracy is 77.6% and balanced accuracy is 83%. We apply the classifier to a dataset of 109,488 tweets produced by far right Twitter accounts during 2017. Whilst most tweets are not Islamophobic, weak Islamophobia is considerably more prevalent (36,963 tweets) than strong (14,895 tweets). Our main input feature is a gloVe word embeddings model trained on a newly collected corpus of 140 million tweets. It outperforms a generic word embeddings model by 5.9 percentage points, demonstrating the importan4ce of context. Unexpectedly, we also find that a one-against-one multi class SVM outperforms a deep learning algorithm. △ Less

Submitted 12 December, 2018; originally announced December 2018.

arXiv:1810.05485 [pdf, other]

doi 10.1098/rsos.182103

Social capital predicts corruption risk in towns

Authors: Johannes Wachs, Taha Yasseri, Balázs Lengyel, János Kertész

Abstract: Corruption is a social plague: gains accrue to small groups, while its costs are borne by everyone. Significant variation in its level between and within countries suggests a relationship between social structure and the prevalence of corruption, yet, large scale empirical studies thereof have been missing due to lack of data. In this paper we relate the structural characteristics of social capita… ▽ More Corruption is a social plague: gains accrue to small groups, while its costs are borne by everyone. Significant variation in its level between and within countries suggests a relationship between social structure and the prevalence of corruption, yet, large scale empirical studies thereof have been missing due to lack of data. In this paper we relate the structural characteristics of social capital of towns with corruption in their local governments. Using datasets from Hungary, we quantify corruption risk by suppressed competition and lack of transparency in the town's awarded public contracts. We characterize social capital using social network data from a popular online platform. Controlling for social, economic, and political factors, we find that settlements with fragmented social networks, indicating an excess of \textit{bonding social capital} have higher corruption risk and towns with more diverse external connectivity, suggesting a surplus of \textit{bridging social capital} are less exposed to corruption. We interpret fragmentation as fostering in-group favoritism and conformity, which increase corruption, while diversity facilitates impartiality in public life and stifles corruption. △ Less

Submitted 12 October, 2018; originally announced October 2018.

Comments: Submitted

Journal ref: Royal Society Open Science, 2019

arXiv:1809.10032 [pdf]

doi 10.1007/s42001-021-00132-w

Computational Courtship: Understanding the Evolution of Online Dating through Large-scale Data Analysis

Authors: Rachel Dinh, Patrick Gildersleve, Chris Blex, Taha Yasseri

Abstract: Have we become more tolerant of dating people of different social backgrounds compared to ten years ago? Has the rise of online dating exacerbated or alleviated gender inequalities in modern courtship? Are the most attractive people on these platforms necessarily the most successful? In this work, we examine the mate preferences and communication patterns of male and female users of the online dat… ▽ More Have we become more tolerant of dating people of different social backgrounds compared to ten years ago? Has the rise of online dating exacerbated or alleviated gender inequalities in modern courtship? Are the most attractive people on these platforms necessarily the most successful? In this work, we examine the mate preferences and communication patterns of male and female users of the online dating site eHarmony over the past decade to identify how attitudes and behaviors have changed over this time period. While other studies have investigated disparities in user behavior between male and female users, this study is unique in its longitudinal approach. Specifically, we analyze how men and women differ in their preferences for certain traits in potential partners and how those preferences have changed over time. The second line of inquiry investigates to what extent physical attractiveness determines the rate of messages a user receives, and how this relationship varies between men and women. Thirdly, we explore whether online dating practices between males and females have become more equal over time or if biases and inequalities have remained constant (or increased). Fourthly, we study the behavioural traits in sending and replying to messages based on one's own experience of receiving messages and being replied to. Finally, we found that similarity between profiles is not a predictor for success except for the number of children and smoking habits. This work could have broader implications for shifting gender norms and social attitudes, reflected in online courtship rituals. Apart from the data-based research, we connect the results to existing theories that concern the role of ICTs in societal change. As searching for love online becomes increasingly common across generations and geographies, these findings may shed light on how people can build relationships through the Internet. △ Less

Submitted 28 June, 2020; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: Preprint, under review

Journal ref: J Comput Soc Sc 5, 401-426 (2022)

arXiv:1712.08647 [pdf, other]

doi 10.1098/rsos.172320

Emo, Love, and God: Making Sense of Urban Dictionary, a Crowd-Sourced Online Dictionary

Authors: Dong Nguyen, Barbara McGillivray, Taha Yasseri

Abstract: The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the "wisdom of the crowd" has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other… ▽ More The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the "wisdom of the crowd" has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often un-monitored environment of such projects may make them susceptible to low quality content. In this work, we focus on Urban Dictionary, a crowd-sourced online dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionary's voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation. △ Less

Submitted 5 April, 2018; v1 submitted 22 December, 2017; originally announced December 2017.

Comments: Accepted, to appear in Royal Society Open Science. Data available upon request

Journal ref: Royal Society Open Science, 5(5), 2018

arXiv:1711.10380 [pdf]

Social Media, Money, and Politics: Campaign Finance in the 2016 US Congressional Cycle

Authors: Lily McElwee, Taha Yasseri

Abstract: With social media penetration deepening among both citizens and political figures, there is a pressing need to understand whether and how political use of major platforms is electorally influential. Particularly, the literature focused on campaign usage is thin and often describe the engagement strategies of politicians or attempt to quantify the impact of social media engagement on political lear… ▽ More With social media penetration deepening among both citizens and political figures, there is a pressing need to understand whether and how political use of major platforms is electorally influential. Particularly, the literature focused on campaign usage is thin and often describe the engagement strategies of politicians or attempt to quantify the impact of social media engagement on political learning, participation, or voting. Few have considered implications for campaign fundraising despite its recognized importance in American politics. This paper is the first to quantify a financial payoff for social media campaigning. Drawing on candidate-level data from Facebook and Twitter, Google Trends, Wikipedia page views, and Federal Election Commission (FEC) donation records, we analyze the relationship between the topic and volume of social media content and campaign funds received by all 108 candidates in the 2016 US Senate general elections. By applying an unsupervised learning approach to identify themes in candidate content across the platforms, we find that more frequent posting overall and of issue-related content are associated with higher donation income when controlling for incumbency, state population, and information-seeking about a candidate, though campaigning-related content has a stronger effect than the latter when the number rather than value of donations is considered. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: Under review. Main article + Supplementary Information

arXiv:1711.09074 [pdf]

doi 10.3389/fdigh.2018.00028

Topic Modelling of Everyday Sexism Project Entries

Authors: Sophie Melville, Kathryn Eccles, Taha Yasseri

Abstract: The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this… ▽ More The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this paper, we take a computational approach to analyze the content of reports. We use topic-modelling techniques to extract emerging topics and concepts from the reports, and to map the semantic relations between those topics. The resulting picture closely resembles and adds to that arrived at through qualitative analysis, showing that this form of topic modeling could be useful for sifting through datasets that had not previously been subject to any analysis. More precisely, we come up with a map of topics for two different resolutions of our topic model and discuss the connection between the identified topics. In the low resolution picture, for instance, we found Public space/Street, Online, Work related/Office, Transport, School, Media harassment, and Domestic abuse. Among these, the strongest connection is between Public space/Street harassment and Domestic abuse and sexism in personal relationships.The strength of the relationships between topics illustrates the fluid and ubiquitous nature of sexism, with no single experience being unrelated to another. △ Less

Submitted 5 April, 2018; v1 submitted 24 November, 2017; originally announced November 2017.

Comments: preprint, under review

Journal ref: Front. Digit. Humanit. 5:28 (2019)

arXiv:1711.05701 [pdf]

doi 10.1016/j.socnet.2019.10.005

Social Complex Contagion in Music Listenership: A Natural Experiment with 1.3 Million Participants

Authors: John Ternovski, Taha Yasseri

Abstract: Can live music events generate complex contagion in music streaming? This paper finds evidence in the affirmative, but only for the most popular artists. We generate a novel dataset from Last.fm, a music tracking website, to analyse the listenership history of 1.3 million users over a two-month time horizon. We use daily play counts along with event attendance data to run a regression discontinuit… ▽ More Can live music events generate complex contagion in music streaming? This paper finds evidence in the affirmative, but only for the most popular artists. We generate a novel dataset from Last.fm, a music tracking website, to analyse the listenership history of 1.3 million users over a two-month time horizon. We use daily play counts along with event attendance data to run a regression discontinuity analysis in order to show the causal impact of concert attendance on music listenership among attendees and their friends network. First, we show that attending a music artist's live concert increases that artist's listenership among the attendees of the concert by approximately 1 song per day per attendee (p-value<0.001). Moreover, we show that this effect is contagious and can spread to users who did not attend the event. However, the extent of contagion depends on the type of artist. We only observe contagious increases in listenership for well-established, popular artists (.06 more daily plays per friend of an attendee [p<0.001]), while the effect is absent for emerging stars. We also show that the contagion effect size increases monotonically with the number of friends who have attended the live event. △ Less

Submitted 15 November, 2017; originally announced November 2017.

Comments: Preprint, under review

Journal ref: Social Networks, Volume 61, 144-152, 2020

arXiv:1710.03326 [pdf, other]

doi 10.1007/978-3-319-73198-8_23

Inspiration, Captivation, and Misdirection: Emergent Properties in Networks of Online Navigation

Authors: Patrick Gildersleve, Taha Yasseri

Abstract: The World Wide Web (WWW) has fundamentally changed the ways billions of people are able to access information. Thus, understanding how people seek information online is an important issue of study. Wikipedia is a hugely important part of information provision on the web, with hundreds of millions of users browsing and contributing to its network of knowledge. The study of navigational behaviour on… ▽ More The World Wide Web (WWW) has fundamentally changed the ways billions of people are able to access information. Thus, understanding how people seek information online is an important issue of study. Wikipedia is a hugely important part of information provision on the web, with hundreds of millions of users browsing and contributing to its network of knowledge. The study of navigational behaviour on Wikipedia, due to the site's popularity and breadth of content, can reveal more general information seeking patterns that may be applied beyond Wikipedia and the Web. Our work addresses the relative shortcomings of existing literature in relating how information structure influences patterns of navigation online. We study aggregated clickstream data for articles on the English Wikipedia in the form of a weighted, directed navigational network. We introduce two parameters that describe how articles act to source and spread traffic through the network, based on their in/out strength and entropy. From these, we construct a navigational phase space where different article types occupy different, distinct regions, indicating how the structure of information online has differential effects on patterns of navigation. Finally, we go on to suggest applications for this analysis in identifying and correcting deficiencies in the Wikipedia page network that may also be adapted to more general information networks. △ Less

Submitted 12 October, 2017; v1 submitted 9 October, 2017; originally announced October 2017.

Journal ref: CompleNet 2018. Springer Proceedings in Complexity. Springer, Cham

arXiv:1609.04285 [pdf]

doi 10.1371/journal.pone.0171774

Even Good Bots Fight: The Case of Wikipedia

Authors: Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, Taha Yasseri

Abstract: In recent years, there has been a huge increase in the number of bots online, varying from Web crawlers for search engines, to chatbots for online customer service, spambots on social media, and content-editing bots in online collaboration communities. The online world has turned into an ecosystem of bots. However, our knowledge of how these automated agents are interacting with each other is rath… ▽ More In recent years, there has been a huge increase in the number of bots online, varying from Web crawlers for search engines, to chatbots for online customer service, spambots on social media, and content-editing bots in online collaboration communities. The online world has turned into an ecosystem of bots. However, our knowledge of how these automated agents are interacting with each other is rather poor. Bots are predictable automatons that do not have the capacity for emotions, meaning-making, creativity, and sociality and it is hence natural to expect interactions between bots to be relatively predictable and uneventful. In this article, we analyze the interactions between bots that edit articles on Wikipedia. We track the extent to which bots undid each other's edits over the period 2001-2010, model how pairs of bots interact over time, and identify different types of interaction trajectories. We find that, although Wikipedia bots are intended to support the encyclopedia, they often undo each other's edits and these sterile "fights" may sometimes continue for years. Unlike humans on Wikipedia, bots' interactions tend to occur over longer periods of time and to be more reciprocated. Yet, just like humans, bots in different cultural environments may behave differently. Our research suggests that even relatively "dumb" bots may give rise to complex interactions, and this carries important implications for Artificial Intelligence research. Understanding what affects bot-bot interactions is crucial for managing social media well, providing adequate cyber-security, and designing well functioning autonomous vehicles. △ Less

Submitted 27 February, 2017; v1 submitted 14 September, 2016; originally announced September 2016.

Comments: Published in PLOS ONE

Journal ref: PLoS ONE (2017) 12(2):e0171774

arXiv:1609.02621 [pdf, ps, other]

doi 10.1126/sciadv.1602368

Memory Remains: Understanding Collective Memory in the Digital Age

Authors: Ruth García-Gavilanes, Anders Mollgaard, Milena Tsvetkova, Taha Yasseri

Abstract: Recently developed information communication technologies, particularly the Internet, have affected how we, both as individuals and as a society, create, store, and recall information. Internet also provides us with a great opportunity to study memory using transactional large scale data, in a quantitative framework similar to the practice in statistical physics. In this project, we make use of on… ▽ More Recently developed information communication technologies, particularly the Internet, have affected how we, both as individuals and as a society, create, store, and recall information. Internet also provides us with a great opportunity to study memory using transactional large scale data, in a quantitative framework similar to the practice in statistical physics. In this project, we make use of online data by analysing viewership statistics of Wikipedia articles on aircraft crashes. We study the relation between recent events and past events and particularly focus on understanding memory triggering patterns. We devise a quantitative model that explains the flow of viewership from a current event to past events based on similarity in time, geography, topic, and the hyperlink structure of Wikipedia articles. We show that on average the secondary flow of attention to past events generated by such remembering processes is larger than the primary attention flow to the current event. We are the first to report these cascading effects. △ Less

Submitted 8 September, 2016; originally announced September 2016.

Comments: Under Review

Journal ref: Science Advances 3(4), 2017

arXiv:1607.08127 [pdf, other]

doi 10.1371/journal.pone.0173561

Understanding and co** with extremism in an online collaborative environment

Authors: Csilla Rudas, Olivér Surányi, Taha Yasseri, János Török

Abstract: The Internet has provided us with great opportunities for large scale collaborative public good projects. Wikipedia is a predominant example of such projects where conflicts emerge and get resolved through bottom-up mechanisms leading to the emergence of the largest encyclopedia in human history. Disaccord arises whenever editors with different opinions try to produce an article reflecting a conse… ▽ More The Internet has provided us with great opportunities for large scale collaborative public good projects. Wikipedia is a predominant example of such projects where conflicts emerge and get resolved through bottom-up mechanisms leading to the emergence of the largest encyclopedia in human history. Disaccord arises whenever editors with different opinions try to produce an article reflecting a consensual view. The debates are mainly heated by editors with extremist views. Using a model of common value production, we show that the consensus can only be reached if extremist groups can actively take part in the discussion and if their views are also represented in the common outcome, at least temporarily. We show that banning problematic editors mostly hinders the consensus as it delays discussion and thus the whole consensus building process. To validate the model, relevant quantities are measured both in simulations and Wikipedia which show satisfactory agreement. We also consider the role of direct communication between editors both in the model and in Wikipedia data (by analysing the Wikipedia {\it talk} pages). While the model suggests that in certain conditions there is an optimal rate of "talking" vs "editing", it correctly predicts that in the current settings of Wikipedia, more activity in talk pages is associated with more controversy. △ Less

Submitted 27 July, 2016; originally announced July 2016.

Comments: 16 pages, 9 figures

arXiv:1607.07495 [pdf]

doi 10.1002/9781118998205.ch12

Understanding Communication Patterns in MOOCs: Combining Data Mining and qualitative methods

Authors: Rebecca Eynon, Isis Hjorth, Taha Yasseri, Nabeel Gillani

Abstract: Massive Open Online Courses (MOOCs) offer unprecedented opportunities to learn at scale. Within a few years, the phenomenon of crowd-based learning has gained enormous popularity with millions of learners across the globe participating in courses ranging from Popular Music to Astrophysics. They have captured the imaginations of many, attracting significant media attention - with The New York Times… ▽ More Massive Open Online Courses (MOOCs) offer unprecedented opportunities to learn at scale. Within a few years, the phenomenon of crowd-based learning has gained enormous popularity with millions of learners across the globe participating in courses ranging from Popular Music to Astrophysics. They have captured the imaginations of many, attracting significant media attention - with The New York Times naming 2012 "The Year of the MOOC." For those engaged in learning analytics and educational data mining, MOOCs have provided an exciting opportunity to develop innovative methodologies that harness big data in education. △ Less

Submitted 25 July, 2016; originally announced July 2016.

Comments: Preprint of a chapter to appear in "Data Mining and Learning Analytics: Applications in Educational Research"

arXiv:1607.03320 [pdf]

What Happens After You Both Swipe Right: A Statistical Description of Mobile Dating Communications

Authors: Jennie Zhang, Taha Yasseri

Abstract: Mobile dating applications (MDAs) have skyrocketed in popularity in the last few years, with popular MDA Tinder alone matching 26 million pairs of users per day. In addition to becoming an influential part of modern dating culture, MDAs facilitate a unique form of mediated communication: dyadic mobile text messages between pairs of users who are not already acquainted. Furthermore, mobile dating h… ▽ More Mobile dating applications (MDAs) have skyrocketed in popularity in the last few years, with popular MDA Tinder alone matching 26 million pairs of users per day. In addition to becoming an influential part of modern dating culture, MDAs facilitate a unique form of mediated communication: dyadic mobile text messages between pairs of users who are not already acquainted. Furthermore, mobile dating has paved the way for analysis of these digital interactions via massive sets of data generated by the instant matching and messaging functions of its many platforms at an unprecedented scale. This paper looks at one of these sets of data: metadata of approximately two million conversations, containing 19 million messages, exchanged between 400,000 heterosexual users on an MDA. Through computational analysis methods, this study offers the very first large scale quantitative depiction of mobile dating as a whole. We report on differences in how heterosexual male and female users communicate with each other on MDAs, differences in behaviors of dyads of varying degrees of social separation, and factors leading to "success"-operationalized by the exchange of phone numbers between a match. For instance, we report that men initiate 79% of conversations--and while about half of the initial messages are responded to, conversations initiated by men are more likely to be reciprocated. We also report that the length of conversations, the waiting times, and the length of messages have fat-tailed distributions. That said, the majority of reciprocated conversations lead to a phone number exchange within the first 20 messages. △ Less

Submitted 12 July, 2016; originally announced July 2016.

Comments: Under Review, 22 pages, 8 tables, 8 figures

arXiv:1606.08829 [pdf, other]

doi 10.1098/rsos.160460

Dynamics and Biases of Online Attention: The Case of Aircraft Crashes

Authors: Ruth García-Gavilanes, Milena Tsvetkova, Taha Yasseri

Abstract: The Internet not only has changed the dynamics of our collective attention, but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the… ▽ More The Internet not only has changed the dynamics of our collective attention, but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the editorial activities on and the viewership of the articles about airline crashes. We analyse how the level of attention is influenced by different parameters such as number of deaths, airline region, and event locale and date. We find evidence that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal. We discuss the implications of these findings in the context of attention bias. △ Less

Submitted 11 September, 2016; v1 submitted 28 June, 2016; originally announced June 2016.

Comments: Accepted for publication in Royal Society Open Science

Journal ref: R. Soc. Open Sci. 2016 3 160460 (12 October 2016)

arXiv:1605.05139 [pdf]

doi 10.3389/fdigh.2017.00011

Two Roads Diverged: A Semantic Network Analysis of Guanxi on Twitter

Authors: Pu Yan, Taha Yasseri

Abstract: Guanxi, roughly translated as "social connection", is a term commonly used in the Chinese language. In this research, we employed a linguistic approach to explore popular discourses on Guanxi. Although sharing the same Confucian roots, Chinese communities inside and outside Mainland China have undergone different historical trajectories. Hence, we took a comparative approach to examine guanxi in M… ▽ More Guanxi, roughly translated as "social connection", is a term commonly used in the Chinese language. In this research, we employed a linguistic approach to explore popular discourses on Guanxi. Although sharing the same Confucian roots, Chinese communities inside and outside Mainland China have undergone different historical trajectories. Hence, we took a comparative approach to examine guanxi in Mainland China and in Taiwan, Hong Kong, and Macau (TW-HK-M). Comparing guanxi discourses in two Chinese societies aims at revealing the divergence of guanxi culture. The data for this research were collected on Twitter over a three-week period by searching tweets containing guanxi written in Simplified Chinese characters and in Traditional Chinese characters. After building, visualising, and conducting community detection on both semantic networks, two guanxi discourses were then compared in terms of their major concept sub-communities. This research aims at addressing two questions: Has the meaning of guanxi transformed in contemporary Chinese societies? And how do different socio-economic configurations affect the practice of guanxi? Results suggest that guanxi in interpersonal relationships has adapted to a new family structure in both Chinese societies. In addition, the practice of guanxi in business varies in Mainland China and in TW-HK-M. Furthermore, an extended domain was identified where guanxi is used in a macro-level discussion of state relations. Network representations of the guanxi discourses enabled reification of the concept and shed lights on the understanding of social connections and social orders in contemporary China. △ Less

Submitted 17 May, 2016; originally announced May 2016.

Comments: under review. 29 pages + supplementary information

Journal ref: Front. Digit. Humanit. 4:11, 2017

arXiv:1605.04774 [pdf, ps, other]

doi 10.3389/fphy.2016.00034

A Biased Review of Biases in Twitter Studies on Political Collective Action

Authors: Peter Cihon, Taha Yasseri

Abstract: In recent years researchers have gravitated to social media platforms, especially Twitter, as fertile ground for empirical analysis of social phenomena. Social media provides researchers access to trace data of interactions and discourse that once went unrecorded in the offline world. Researchers have sought to use these data to explain social phenomena both particular to social media and applicab… ▽ More In recent years researchers have gravitated to social media platforms, especially Twitter, as fertile ground for empirical analysis of social phenomena. Social media provides researchers access to trace data of interactions and discourse that once went unrecorded in the offline world. Researchers have sought to use these data to explain social phenomena both particular to social media and applicable to the broader social world. This paper offers a minireview of Twitter-based research on political crowd behavior. This literature offers insight into particular social phenomena on Twitter, but often fails to use standardized methods that permit interpretation beyond individual studies. Moreover, the literature fails to ground methodologies and results in social or political theory, divorcing empirical research from the theory needed to interpret it. Rather, papers focus primarily on methodological innovations for social media analyses, but these too often fail to sufficiently demonstrate the validity of such methodologies. This minireview considers a small number of selected papers; we analyze their (often lack of) theoretical approaches, review their methodological innovations, and offer suggestions as to the relevance of their results for political scientists and sociologists. △ Less

Submitted 16 May, 2016; originally announced May 2016.

Comments: Mini-review paper, 10 pages. Draft under review

Journal ref: Front. Phys. 4:34, 2016

arXiv:1602.07199 [pdf]

doi 10.1007/978-3-319-39510-4_2

Human-Machine Networks: Towards a Typology and Profiling Framework

Authors: Aslak Wegner Eide, J. Brian Pickering, Taha Yasseri, George Bravos, Asbjørn Følstad, Vegard Engen, Milena Tsvetkova, Eric T. Meyer, Paul Walland, Marika Lüders

Abstract: In this paper we outline an initial typology and framework for the purpose of profiling human-machine networks, that is, collective structures where humans and machines interact to produce synergistic effects. Profiling a human-machine network along the dimensions of the typology is intended to facilitate access to relevant design knowledge and experience. In this way the profiling of an envisione… ▽ More In this paper we outline an initial typology and framework for the purpose of profiling human-machine networks, that is, collective structures where humans and machines interact to produce synergistic effects. Profiling a human-machine network along the dimensions of the typology is intended to facilitate access to relevant design knowledge and experience. In this way the profiling of an envisioned or existing human-machine network will both facilitate relevant design discussions and, more importantly, serve to identify the network type. We present experiences and results from two case trials: a crisis management system and a peer-to-peer reselling network. Based on the lessons learnt from the case trials we suggest potential benefits and challenges, and point out needed future work. △ Less

Submitted 1 March, 2016; v1 submitted 23 February, 2016; originally announced February 2016.

Comments: Pre-print; To be presented at the 18th International Conference on Human-Computer Interaction International, Toronto, Canada, 17 - 22 July 2016

arXiv:1602.01652 [pdf]

doi 10.1038/srep36333

Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in Online Collaboration

Authors: Milena Tsvetkova, Ruth García-Gavilanes, Taha Yasseri

Abstract: Disagreement and conflict are a fact of social life and considerably affect our well-being and productivity. Such negative interactions are rarely explicitly declared and recorded and this makes them hard for scientists to study. We overcome this challenge by investigating the patterns in the timing and configuration of contributions to a large online collaboration community. We analyze sequences… ▽ More Disagreement and conflict are a fact of social life and considerably affect our well-being and productivity. Such negative interactions are rarely explicitly declared and recorded and this makes them hard for scientists to study. We overcome this challenge by investigating the patterns in the timing and configuration of contributions to a large online collaboration community. We analyze sequences of reverts of contributions to Wikipedia, the largest online encyclopedia, and investigate how often and how fast they occur compared to a null model that randomizes the order of actions to remove any systematic clustering. We find evidence that individuals systematically attack the same person and attack back their attacker; both of these interactions occur at a faster response rate than expected. We also establish that individuals come to defend an attack victim but we do not find evidence that attack victims "pay it forward" or that attackers collude to attack the same individual. We further find that high-status contributors are more likely to attack many others serially, status equals are more likely to revenge attacks back, while attacks by lower-status contributors trigger attacks forward; yet, it is the lower-status contributors who also come forward to defend third parties. The method we use can be applied to other large-scale temporal communication and collaboration networks to identify the existence of negative social interactions and other social processes. △ Less

Submitted 26 October, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

Comments: Forthcoming in Scientific Reports

Journal ref: Scientific Reports (2016) 6:36333

arXiv:1601.06805 [pdf, other]

doi 10.3389/fphy.2016.00006

P-values: misunderstood and misused

Authors: Bertie Vidgen, Taha Yasseri

Abstract: P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made the p-value an even more popular tool to test the significance of a study. However, substantial literature has been produced critiquing how p-values are used and understood. In this paper we review this recent critical literat… ▽ More P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made the p-value an even more popular tool to test the significance of a study. However, substantial literature has been produced critiquing how p-values are used and understood. In this paper we review this recent critical literature, much of which is routed in the life sciences, and consider its implications for social scientific research. We provide a coherent picture of what the main criticisms are, and draw together and disambiguate common themes. In particular, we explain how the False Discovery Rate is calculated, and how this differs from a p-value. We also make explicit the Bayesian nature of many recent criticisms, a dimension that is often underplayed or ignored. We conclude by identifying practical steps to help remediate some of the concerns identified. We recommend that (i) far lower significance levels are used, such as $0.01$ or $0.001$, and (ii) p-values are interpreted contextually, and situated within both the findings of the individual study and the broader field of inquiry (through, for example, meta-analyses). △ Less

Submitted 10 March, 2016; v1 submitted 25 January, 2016; originally announced January 2016.

Comments: Published in Frontiers in Physics: Vidgen B and Yasseri T (2016) P-Values: Misunderstood and Misused. Front. Phys. 4:6

Journal ref: Front. Phys. 4:6, 2016

arXiv:1511.05324 [pdf, other]

doi 10.1145/3039868

Understanding Human-Machine Networks: A Cross-Disciplinary Survey

Authors: Milena Tsvetkova, Taha Yasseri, Eric T. Meyer, J. Brian Pickering, Vegard Engen, Paul Walland, Marika Lüders, Asbjørn Følstad, George Bravos

Abstract: In the current hyper-connected era, modern Information and Communication Technology systems form sophisticated networks where not only do people interact with other people, but also machines take an increasingly visible and participatory role. Such human-machine networks (HMNs) are embedded in the daily lives of people, both for personal and professional use. They can have a significant impact by… ▽ More In the current hyper-connected era, modern Information and Communication Technology systems form sophisticated networks where not only do people interact with other people, but also machines take an increasingly visible and participatory role. Such human-machine networks (HMNs) are embedded in the daily lives of people, both for personal and professional use. They can have a significant impact by producing synergy and innovations. The challenge in designing successful HMNs is that they cannot be developed and implemented in the same manner as networks of machines nodes alone, nor following a wholly human-centric view of the network. The problem requires an interdisciplinary approach. Here, we review current research of relevance to HMNs across many disciplines. Extending the previous theoretical concepts of socio-technical systems, actor-network theory, cyber-physical-social systems, and social machines, we concentrate on the interactions among humans and between humans and machines. We identify eight types of HMNs: public-resource computing, crowdsourcing, web search engines, crowdsensing, online markets, social media, multiplayer online games and virtual worlds, and mass collaboration. We systematically select literature on each of these types and review it with a focus on implications for designing HMNs. Moreover, we discuss risks associated with HMNs and identify emerging design and development trends. △ Less

Submitted 18 January, 2017; v1 submitted 17 November, 2015; originally announced November 2015.

Comments: Forthcoming in ACM Computing Surveys

ACM Class: A.1; C.2.4; H.1.2; J.4; K.6.0

Journal ref: ACM Comput. Surv. 50, 1, 12 (2018)

arXiv:1505.01818 [pdf]

doi 10.1140/epjds/s13688-016-0083-3

Wikipedia traffic data and electoral prediction: towards theoretically informed models

Authors: Taha Yasseri, Jonathan Bright

Abstract: This aim of this article is to explore the potential use of Wikipedia page view data for predicting electoral results. Responding to previous critiques of work using socially generated data to predict elections, which have argued that these predictions take place without any understanding of the mechanism which enables them, we first develop a theoretical model which highlights why people might se… ▽ More This aim of this article is to explore the potential use of Wikipedia page view data for predicting electoral results. Responding to previous critiques of work using socially generated data to predict elections, which have argued that these predictions take place without any understanding of the mechanism which enables them, we first develop a theoretical model which highlights why people might seek information online at election time, and how this activity might relate to overall electoral outcomes, focussing especially on how different types of parties such as new and established parties might generate different information seeking patterns. We test this model on a novel dataset drawn from a variety of countries in the 2009 and 2014 European Parliament elections. We show that while Wikipedia offers little insight into absolute vote outcomes, it offers a good information about changes in both overall turnout at elections and in vote share for particular parties. These results are used to enhance existing theories about the drivers of aggregate patterns in online information seeking. △ Less

Submitted 22 January, 2016; v1 submitted 5 May, 2015; originally announced May 2015.

Comments: submitted to EPJ Data Science. Additional File 1 available at https://drive.google.com/open?id=0BxaGC-YCTO6SWkJhRXlrMVRYVlE

Journal ref: EPJ Data Science, 5: 22 (2016)

arXiv:1411.3662 [pdf]

doi 10.1038/srep06447

Structural limitations of learning in a crowd: communication vulnerability and information diffusion in MOOCs

Authors: Nabeel Gillani, Taha Yasseri, Rebecca Eynon, Isis Hjorth

Abstract: Massive Open Online Courses (MOOCs) bring together a global crowd of thousands of learners for several weeks or months. In theory, the openness and scale of MOOCs can promote iterative dialogue that facilitates group cognition and knowledge construction. Using data from two successive instances of a popular business strategy MOOC, we filter observed communication patterns to arrive at the "signifi… ▽ More Massive Open Online Courses (MOOCs) bring together a global crowd of thousands of learners for several weeks or months. In theory, the openness and scale of MOOCs can promote iterative dialogue that facilitates group cognition and knowledge construction. Using data from two successive instances of a popular business strategy MOOC, we filter observed communication patterns to arrive at the "significant" interaction networks between learners and use complex network analysis to explore the vulnerability and information diffusion potential of the discussion forums. We find that different discussion topics and pedagogical practices promote varying levels of 1) "significant" peer-to-peer engagement, 2) participant inclusiveness in dialogue, and ultimately, 3) modularity, which impacts information diffusion to prevent a truly "global" exchange of knowledge and learning. These results indicate the structural limitations of large-scale crowd-based learning and highlight the different ways that learners in MOOCs leverage, and learn within, social contexts. We conclude by exploring how these insights may inspire new developments in online education. △ Less

Submitted 13 November, 2014; originally announced November 2014.

Comments: Pre-print version. Published version available at http://dx.doi.org/10.1038/srep06447

Journal ref: Sci Rep 4, 6447 (2014)

arXiv:1408.3562 [pdf]

doi 10.1371/journal.pone.0196068

Investigating Political Participation and Social Information Using Big Data and a Natural Experiment

Authors: Scott A. Hale, Peter John, Helen Margetts, Taha Yasseri

Abstract: Social information is particularly prominent in digital settings where the design of platforms can more easily give real-time information about the behaviour of peers and reference groups and thereby stimulate political activity. Changes to these platforms can generate natural experiments allowing an assessment of the impact of changes in social information and design on participation. This paper… ▽ More Social information is particularly prominent in digital settings where the design of platforms can more easily give real-time information about the behaviour of peers and reference groups and thereby stimulate political activity. Changes to these platforms can generate natural experiments allowing an assessment of the impact of changes in social information and design on participation. This paper investigates the impact of the introduction of trending information on the homepage of the UK government petitions platform. Using interrupted time series and a regression discontinuity design, we find that the introduction of the trending feature had no statistically significant effect on the overall number of signatures per day, but that the distribution of signatures across petitions changes: the most popular petitions gain even more signatures at the expense of those with less signatories. We find significant differences between petitions trending at different ranks, even after controlling for each petition's individual growth prior to trending. The findings suggest a non-negligible group of individuals visit the homepage of the site looking for petitions to sign and therefore see the list of trending petitions, and a significant proportion of this group responds to the social information that it provides. These findings contribute to our understanding of how social information, and the form in which it is presented, affects individual political behaviour in digital settings. △ Less

Submitted 15 August, 2014; originally announced August 2014.

Comments: Prepared for delivery at the 2014 Annual Meeting of the American Political Science Association, August 28-31, 2014

Journal ref: PLOS ONE 13(4): e0196068 (2018)

arXiv:1405.2856 [pdf, other]

doi 10.1145/2615569.2615691

Map** the UK Webspace: Fifteen Years of British Universities on the Web

Authors: Scott A. Hale, Taha Yasseri, Josh Cowls, Eric T. Meyer, Ralph Schroeder, Helen Margetts

Abstract: This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time.… ▽ More This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research. △ Less

Submitted 12 May, 2014; originally announced May 2014.

Comments: To appear in the proceeding of WebSci 2014

Journal ref: Proceedings of the 2014 ACM conference on Web science (WebSci '14). Association for Computing Machinery, New York, NY, USA, 62-70

arXiv:1403.3568 [pdf, other]

doi 10.1140/epjds/s13688-014-0007-z

Modeling Social Dynamics in a Collaborative Environment

Authors: Gerardo Iñiguez, János Török, Taha Yasseri, Kimmo Kaski, János Kertész

Abstract: Wikipedia is a prime example of today's value production in a collaborative environment. Using this example, we model the emergence, persistence and resolution of severe conflicts during collaboration by coupling opinion formation with article editing in a bounded confidence dynamics. The complex social behavior involved in editing articles is implemented as a minimal model with two basic elements… ▽ More Wikipedia is a prime example of today's value production in a collaborative environment. Using this example, we model the emergence, persistence and resolution of severe conflicts during collaboration by coupling opinion formation with article editing in a bounded confidence dynamics. The complex social behavior involved in editing articles is implemented as a minimal model with two basic elements; (i) individuals interact directly to share information and convince each other, and (ii) they edit a common medium to establish their own opinions. Opinions of the editors and that represented by the article are characterised by a scalar variable. When the pool of editors is fixed, three regimes can be distinguished: (a) a stable mainstream article opinion is continuously contested by editors with extremist views and there is slow convergence towards consensus, (b) the article oscillates between editors with extremist views, reaching consensus relatively fast at one of the extremes, and (c) the extremist editors are converted very fast to the mainstream opinion and the article has an erratic evolution. When editors are renewed with a certain rate, a dynamical transition occurs between different kinds of edit wars, which qualitatively reflect the dynamics of conflicts as observed in real Wikipedia data. △ Less

Submitted 14 June, 2014; v1 submitted 14 March, 2014; originally announced March 2014.

Comments: Revised version, to appear in EPJ Data Science; 19 pages 9 figures

Journal ref: EPJ Data Science 3 (1), 7 (2014)

arXiv:1312.2818 [pdf, ps, other]

doi 10.1515/itit-2014-1046

Can electoral popularity be predicted using socially generated big data?

Authors: Taha Yasseri, Jonathan Bright

Abstract: Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of can… ▽ More Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of candidates and parties through the analysis of socially generated data on the web during electoral campaigns. Such data offer considerable possibility for improving our awareness of popularity dynamics. However they also suffer from significant drawbacks in terms of representativeness and generalisability. In this paper we discuss potential ways around such problems, suggesting the nature of different political systems and contexts might lend differing levels of predictive power to certain types of data source. We offer an initial exploratory test of these ideas, focussing on two data streams, Wikipedia page views and Google search queries. On the basis of this data, we present popularity dynamics from real case examples of recent elections in three different countries. △ Less

Submitted 8 August, 2014; v1 submitted 10 December, 2013; originally announced December 2013.

Comments: To appear in Information Technology

Journal ref: it - Information Technology, vol. 56, no. 5, 2014, pp. 246-253

arXiv:1310.8508 [pdf, other]

doi 10.1140/epjds20

The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics

Authors: Anna Samoilenko, Taha Yasseri

Abstract: Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featur… ▽ More Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featured in the world's largest online encyclopedia is correlated with higher academic notability (assessed through citation counts). We found no statistically significant correlation between Wikipedia articles metrics (length, number of edits, number of incoming links from other articles, etc.) and academic notability of the mentioned researchers. We also did not find any evidence that the scientists with better WP representation are necessarily more prominent in their fields. In addition, we inspected the Wikipedia coverage of notable scientists sampled from Thomson Reuters list of "highly cited researchers". In each of the examined fields, Wikipedia failed in covering notable scholars properly. Both findings imply that Wikipedia might be producing an inaccurate image of academics on the front end of science. By shedding light on how public perception of academic progress is formed, this study alerts that a subjective element might have been introduced into the hitherto structured system of academic evaluation. △ Less

Submitted 10 December, 2013; v1 submitted 31 October, 2013; originally announced October 2013.

Comments: To appear in EPJ Data Science. To have the Additional Files and Datasets e-mail the corresponding author

Journal ref: EPJ Data Science 2014, 3:1

arXiv:1308.0239 [pdf, other]

doi 10.1140/epjds/s13688-017-0116-6

Rapid rise and decay in petition signing

Authors: Taha Yasseri, Scott A. Hale, Helen Margetts

Abstract: Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a give… ▽ More Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a given platform. This paper tracks the growth curves of all 20,000 petitions to the UK government petitions website (http://epetitions.direct.gov.uk) and 1,800 petitions to the US White House site (https://petitions.whitehouse.gov), analyzing the rate of growth and outreach mechanism. Previous research has suggested the importance of the first day to the ultimate success of a petition, but has not examined early growth within that day, made possible here through hourly resolution in the data. The analysis shows that the vast majority of petitions do not achieve any measure of success; over 99 percent fail to get the 10,000 signatures required for an official response and only 0.1 percent attain the 100,000 required for a parliamentary debate (0.7 percent in the US). We analyze the data through a multiplicative process model framework to explain the heterogeneous growth of signatures at the population level. We define and measure an average outreach factor for petitions and show that it decays very fast (reducing to 0.1 pervent after 10 hours in the UK and 30 hours in the US). After a day or two, a petition's fate is virtually set. The findings challenge conventional analyses of collective action from economics and political science, where the production function has been assumed to follow an S-shaped curve. △ Less

Submitted 3 January, 2023; v1 submitted 1 August, 2013; originally announced August 2013.

Comments: For the final version see https://link.springer.com/content/pdf/10.1140/epjds/s13688-017-0116-6.pdf

Journal ref: EPJ Data Science (2017) 6:20

arXiv:1305.5566 [pdf]

The most controversial topics in Wikipedia: A multilingual and geographical analysis

Authors: Taha Yasseri, Anselm Spoerri, Mark Graham, János Kertész

Abstract: We present, visualize and analyse the similarities and differences between the controversial topics related to "edit wars" identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top… ▽ More We present, visualize and analyse the similarities and differences between the controversial topics related to "edit wars" identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top 100 lists of most controversial articles in different languages and the content related to geographical locations will be presented. We discuss what the presented analysis and visualizations can tell us about the multicultural aspects of Wikipedia and practices of peer-production. Our results indicate that Wikipedia is more than just an encyclopaedia; it is also a window into convergent and divergent social-spatial priorities, interests and preferences. △ Less

Submitted 8 July, 2013; v1 submitted 23 May, 2013; originally announced May 2013.

Comments: This is a draft of a book chapter to be published in 2014 by Scarecrow Press. Please cite as: Yasseri T., Spoerri A., Graham M., and Kertész J., The most controversial topics in Wikipedia: A multilingual and geographical analysis. In: Fichman P., Hara N., editors, Global Wikipedia:International and cross-cultural issues in online collaboration. Scarecrow Press (2014)

Showing 1–50 of 59 results for author: Yasseri, T