Search | arXiv e-print repository

Human-machine social systems

Authors: Milena Tsvetkova, Taha Yasseri, Niccolo Pescetelli, Tobias Werner

Abstract: From fake accounts on social media and generative-AI bots such as ChatGPT to high-frequency trading algorithms on financial markets and self-driving vehicles on the streets, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and… ▽ More From fake accounts on social media and generative-AI bots such as ChatGPT to high-frequency trading algorithms on financial markets and self-driving vehicles on the streets, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and autonomous machines constitute complex adaptive social systems where the collective outcomes cannot be simply deduced from either human or machine behavior alone. Under this paradigm, we review recent experimental, theoretical, and observational research from across a range of disciplines - robotics, human-computer interaction, web science, complexity science, computational social science, finance, economics, political science, social psychology, and sociology. We identify general dynamics and patterns in situations of competition, coordination, cooperation, contagion, and collective decision-making, and contextualize them in four prominent existing human-machine communities: high-frequency trading markets, the social media platform formerly known as Twitter, the open-collaboration encyclopedia Wikipedia, and the news aggregation and discussion community Reddit. We conclude with suggestions for the research, design, and governance of human-machine social systems, which are necessary to reduce misinformation, prevent financial crashes, improve road safety, overcome labor market disruptions, and enable a better human future. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 44 pages, 2 figures

ACM Class: A.1; C.2.4; H.1.2; J.4; K.4.0; K.6.0

arXiv:2303.10036 [pdf, other]

Individual differences in knowledge network navigation

Authors: Manran Zhu, Taha Yasseri, János Kertész

Abstract: With the rapid accumulation of online information, efficient web navigation has grown vital yet challenging. To create an easily navigable cyberspace catering to diverse demographics, understanding how people navigate differently is paramount. While previous research has unveiled individual differences in spatial navigation, such differences in knowledge space navigation remain sparse. To bridge t… ▽ More With the rapid accumulation of online information, efficient web navigation has grown vital yet challenging. To create an easily navigable cyberspace catering to diverse demographics, understanding how people navigate differently is paramount. While previous research has unveiled individual differences in spatial navigation, such differences in knowledge space navigation remain sparse. To bridge this gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed personal information questionnaires. Our analysis shows that age negatively affects knowledge space navigation performance, while multilingualism enhances it. Under time pressure, participants' performance improves across trials and males outperform females, an effect not observed in games without time pressure. In our experiment, successful route-finding is usually not related to abilities of innovative exploration of routes. Our results underline the importance of age, multilingualism and time constraint in the knowledge space navigation. △ Less

Submitted 19 March, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: 14 pages, 4 figures

arXiv:2211.07616 [pdf, other]

Between News and History: Identifying Networked Topics of Collective Attention on Wikipedia

Authors: Patrick Gildersleve, Renaud Lambiotte, Taha Yasseri

Abstract: The digital information landscape has introduced a new dimension to understanding how we collectively react to new information and preserve it at the societal level. This, together with the emergence of platforms such as Wikipedia, has challenged traditional views on the relationship between current events and historical accounts of events, with an ever-shrinking divide between "news" and "history… ▽ More The digital information landscape has introduced a new dimension to understanding how we collectively react to new information and preserve it at the societal level. This, together with the emergence of platforms such as Wikipedia, has challenged traditional views on the relationship between current events and historical accounts of events, with an ever-shrinking divide between "news" and "history". Wikipedia's place as the Internet's primary reference work thus poses the question of how it represents both traditional encyclopaedic knowledge and evolving important news stories. In other words, how is information on and attention towards current events integrated into the existing topical structures of Wikipedia? To address this we develop a temporal community detection approach towards topic detection that takes into account both short term dynamics of attention as well as long term article network structures. We apply this method to a dataset of one year of current events on Wikipedia to identify clusters distinct from those that would be found solely from page view time series correlations or static network structure. We are able to resolve the topics that more strongly reflect unfolding current events vs more established knowledge by the relative importance of collective attention dynamics vs link structures. We also offer important developments by identifying and describing the emergent topics on Wikipedia. This work provides a means of distinguishing how these information and attention clusters are related to Wikipedia's twin faces of encyclopaedic knowledge and current events -- crucial to understanding the production and consumption of knowledge in the digital age. △ Less

Submitted 12 May, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2207.01352 [pdf, other]

doi 10.1038/s41598-023-39035-3

Terrorist attacks sharpen the binary perception of "Us" vs. "Them"

Authors: Milan Jović, Lovro Šubelj, Tea Golob, Matej Makarovič, Taha Yasseri, Danijela Boberić Krstićev, Srdjan Škrbić, Zoran Levnajić

Abstract: Terrorist attacks not only harm citizens but also shift their attention, which has long-lasting impacts on public opinion and government policies. Yet measuring the changes in public attention beyond media coverage has been methodologically challenging. Here we approach this problem by starting from Wikipedia's répertoire of 5.8 million articles and a sample of 15 recent terrorist attacks. We depl… ▽ More Terrorist attacks not only harm citizens but also shift their attention, which has long-lasting impacts on public opinion and government policies. Yet measuring the changes in public attention beyond media coverage has been methodologically challenging. Here we approach this problem by starting from Wikipedia's répertoire of 5.8 million articles and a sample of 15 recent terrorist attacks. We deploy a complex exclusion procedure to identify topics and themes that consistently received a significant increase in attention due to these incidents. Examining their contents reveals a clear picture: terrorist attacks foster establishing a sharp boundary between "Us" (the target society) and "Them" (the terrorist as the enemy). In the midst of this, one seeks to construct identities of both sides. This triggers curiosity to learn more about "Them" and soul-search for a clearer understanding of "Us". This systematic analysis of public reactions to disruptive events could help mitigate their societal consequences. △ Less

Submitted 3 August, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Peer-reviewed; Published

Journal ref: Sci Rep 13, 12451 (2023)

arXiv:2207.01042 [pdf]

doi 10.1016/bs.pbr.2022.07.001

Collective Memory in the Digital Age

Authors: Taha Yasseri, Patrick Gildersleve, Lea David

Abstract: The digital transformation of our societies and in particular information and communication technologies have revolutionized how we generate, communicate, and acquire information. Collective memory as a core and unifying force in our societies has not been an exception among many societal concepts which have been revolutionized through digital transformation. In this chapter, we have distinguished… ▽ More The digital transformation of our societies and in particular information and communication technologies have revolutionized how we generate, communicate, and acquire information. Collective memory as a core and unifying force in our societies has not been an exception among many societal concepts which have been revolutionized through digital transformation. In this chapter, we have distinguished between "the digitalized collective memory" and "collective memory in the digital age". In addition to discussing these two main concepts, we discuss how digital tools and trace data can open doorways into the study of collective memory that is formed inside and outside of the digital space. △ Less

Submitted 3 July, 2022; originally announced July 2022.

Comments: This is a preprint of a Chapter to appear in "Collective Memory" Edited by Shane O'Mara and to be published by Elsevier in 2022. Please cite as: Yasseri, T., Gildersleve, P., and David, L. (2022), Collective Memory in the Digital Age, In S. O'Mara (Ed.), Collective Memory, Elsevier

Journal ref: Progress in Brain Research 274-1, pp 203-226 (2022)

arXiv:2104.13754 [pdf]

doi 10.1145/3578645

Can crowdsourcing rescue the social marketplace of ideas?

Authors: Taha Yasseri, Filippo Menczer

Abstract: Facebook and Twitter recently announced community-based review platforms to address misinformation. We provide an overview of the potential affordances of such community-based approaches to content moderation based on past research and preliminary analysis of Twitter's Birdwatch data. While our analysis generally supports a community-based approach to content moderation, it also warns against pote… ▽ More Facebook and Twitter recently announced community-based review platforms to address misinformation. We provide an overview of the potential affordances of such community-based approaches to content moderation based on past research and preliminary analysis of Twitter's Birdwatch data. While our analysis generally supports a community-based approach to content moderation, it also warns against potential pitfalls, particularly when the implementation of the new infrastructure focuses on crowd-based "validation" rather than "collaboration." We call for multidisciplinary research utilizing methods from complex systems studies, behavioural sociology, and computational social science to advance the research on crowd-based content moderation. △ Less

Submitted 19 December, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: In Press in Communications of the ACM (CACM)

Journal ref: Communications of the ACM (2023)

arXiv:2101.02695 [pdf]

doi 10.3389/fphy.2021.650720

Gender Imbalance and Spatiotemporal Patterns of Contributions to Citizen Science Projects: the case of Zooniverse

Authors: Khairunnisa Ibrahim, Samuel Khodursky, Taha Yasseri

Abstract: Citizen Science is research undertaken by professional scientists and members of the public collaboratively. Despite numerous benefits of citizen science for both the advancement of science and the community of the citizen scientists, there is still no comprehensive knowledge of patterns of contributions, and the demography of contributors to citizen science projects. In this paper we provide a fi… ▽ More Citizen Science is research undertaken by professional scientists and members of the public collaboratively. Despite numerous benefits of citizen science for both the advancement of science and the community of the citizen scientists, there is still no comprehensive knowledge of patterns of contributions, and the demography of contributors to citizen science projects. In this paper we provide a first overview of spatiotemporal and gender distribution of citizen science workforce by analyzing 54 million classifications contributed by more than 340 thousand citizen science volunteers from 198 countries to one of the largest citizen science platforms, Zooniverse. First we report on the uneven geographical distribution of the citizen scientist and model the variations among countries based on the socio-economic conditions as well as the level of research investment in each country. Analyzing the temporal features of contributions, we report on high "burstiness" of participation instances as well as the leisurely nature of participation suggested by the time of the day that the citizen scientists were the most active. Finally, we discuss the gender imbalance among citizen scientists (about 30% female) and compare it with other collaborative projects as well as the gender distribution in more formal scientific activities. Citizen science projects need further attention from outside of the academic community, and our findings can help attract the attention of public and private stakeholders, as well as to inform the design of the platforms and science policy making processes. △ Less

Submitted 7 January, 2021; originally announced January 2021.

Comments: Under Review

Journal ref: Front. Phys. 9:650720 (2021)

arXiv:2101.00296 [pdf]

Tweeting for the Cause: Network analysis of UK petition sharing

Authors: Peter Cihon, Taha Yasseri, Scott Hale, Helen Margetts

Abstract: Online government petitions represent a new data-rich mode of political participation. This work examines the thus far understudied dynamics of sharing petitions on social media in order to garner signatures and, ultimately, a government response. Using 20 months of Twitter data comprising over 1 million tweets linking to a petition, we perform analyses of networks constructed of petitions and sup… ▽ More Online government petitions represent a new data-rich mode of political participation. This work examines the thus far understudied dynamics of sharing petitions on social media in order to garner signatures and, ultimately, a government response. Using 20 months of Twitter data comprising over 1 million tweets linking to a petition, we perform analyses of networks constructed of petitions and supporters on Twitter, revealing implicit social dynamics therein. We find that Twitter users do not exclusively share petitions on one issue nor do they share exclusively popular petitions. Among the over 240,000 Twitter users, we find latent support groups, with the most central users primarily being politically active "average" individuals. Twitter as a platform for sharing government petitions, thus, appears to hold potential to foster the creation of and coordination among a new form of latent support interest groups online. △ Less

Submitted 1 January, 2021; originally announced January 2021.

Comments: Presented at IPP2016 Conference, Oxford, UK. http://blogs.oii.ox.ac.uk/ipp-conference/2016.html

arXiv:2009.11038 [pdf, other]

The cost of coordination can exceed the benefit of collaboration in performing complex tasks

Authors: Vince J. Straub, Milena Tsvetkova, Taha Yasseri

Abstract: Humans and other intelligent agents often rely on collective decision making based on an intuition that groups outperform individuals. However, at present, we lack a complete theoretical understanding of when groups perform better. Here we examine performance in collective decision-making in the context of a real-world citizen science task environment in which individuals with manipulated differen… ▽ More Humans and other intelligent agents often rely on collective decision making based on an intuition that groups outperform individuals. However, at present, we lack a complete theoretical understanding of when groups perform better. Here we examine performance in collective decision-making in the context of a real-world citizen science task environment in which individuals with manipulated differences in task-relevant training collaborated. We find 1) dyads gradually improve in performance but do not experience a collective benefit compared to individuals in most situations; 2) the cost of coordination to efficiency and speed that results when switching to a dyadic context after training individually is consistently larger than the leverage of having a partner, even if they are expertly trained in that task; and 3) on the most complex tasks having an additional expert in the dyad who is adequately trained improves accuracy. These findings highlight that the extent of training received by an individual, the complexity of the task at hand, and the desired performance indicator are all critical factors that need to be accounted for when weighing up the benefits of collective decision-making. △ Less

Submitted 27 January, 2023; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: in Press in Collective Intelligence. Please cite the published version using the DOI below

arXiv:2001.02878 [pdf]

doi 10.1080/0022250X.2020.1818078

Positive algorithmic bias cannot stop fragmentation in homophilic networks

Authors: Chris Blex, Taha Yasseri

Abstract: Fragmentation, echo chambers, and their amelioration in social networks have been a growing concern in the academic and non-academic world. This paper shows how, under the assumption of homophily, echo chambers and fragmentation are system-immanent phenomena of highly flexible social networks, even under ideal conditions for heterogeneity. We achieve this by finding an analytical, network-based so… ▽ More Fragmentation, echo chambers, and their amelioration in social networks have been a growing concern in the academic and non-academic world. This paper shows how, under the assumption of homophily, echo chambers and fragmentation are system-immanent phenomena of highly flexible social networks, even under ideal conditions for heterogeneity. We achieve this by finding an analytical, network-based solution to the Schelling model and by proving that weak ties do not hinder the process. Furthermore, we derive that no level of positive algorithmic bias in the form of rewiring is capable of preventing fragmentation and its effect on reducing the fragmentation speed is negligible. △ Less

Submitted 9 September, 2021; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: Cite as: Chris Blex & Taha Yasseri (2020) Positive algorithmic bias cannot stop fragmentation in homophilic networks, The Journal of Mathematical Sociology, DOI: 10.1080/0022250X.2020.1818078

Journal ref: The Journal of Mathematical Sociology, 46:1, 80-97 (2022)

arXiv:1910.05794 [pdf]

doi 10.1080/18335330.2021.1892166

Islamophobes are not all the same! A study of far right actors on Twitter

Authors: Bertie Vidgen, Taha Yasseri, Helen Margetts

Abstract: Far-right actors are often purveyors of Islamophobic hate speech online, using social media to spread divisive and prejudiced messages which can stir up intergroup tensions and conflict. Hateful content can inflict harm on targeted victims, create a sense of fear amongst communities and stir up intergroup tensions and conflict. Accordingly, there is a pressing need to better understand at a granul… ▽ More Far-right actors are often purveyors of Islamophobic hate speech online, using social media to spread divisive and prejudiced messages which can stir up intergroup tensions and conflict. Hateful content can inflict harm on targeted victims, create a sense of fear amongst communities and stir up intergroup tensions and conflict. Accordingly, there is a pressing need to better understand at a granular level how Islamophobia manifests online and who produces it. We investigate the dynamics of Islamophobia amongst followers of a prominent UK far right political party on Twitter, the British National Party. Analysing a new data set of five million tweets, collected over a period of one year, using a machine learning classifier and latent Markov modelling, we identify seven types of Islamophobic far right actors, capturing qualitative, quantitative and temporal differences in their behaviour. Notably, we show that a small number of users are responsible for most of the Islamophobia that we observe. We then discuss the policy implications of this typology in the context of social media regulation. △ Less

Submitted 8 March, 2021; v1 submitted 13 October, 2019; originally announced October 2019.

Journal ref: Journal of Policing, Intelligence and Counter Terrorism, 17:1, 1-23 (2022)

arXiv:1908.08991 [pdf, other]

doi 10.1098/rsos.210617

Football is becoming more predictable; Network analysis of 88 thousands matches in 11 major leagues

Authors: Victor Martins Maimone, Taha Yasseri

Abstract: In recent years excessive monetization of football and professionalism among the players has been argued to have affected the quality of the match in different ways. On the one hand, playing football has become a high-income profession and the players are highly motivated; on the other hand, stronger teams have higher incomes and therefore afford better players leading to an even stronger appearan… ▽ More In recent years excessive monetization of football and professionalism among the players has been argued to have affected the quality of the match in different ways. On the one hand, playing football has become a high-income profession and the players are highly motivated; on the other hand, stronger teams have higher incomes and therefore afford better players leading to an even stronger appearance in tournaments that can make the game more imbalanced and hence predictable. To quantify and document this observation, in this work we take a minimalist network science approach to measure the predictability of football over 26 years in major European leagues. We show that over time, the games in major leagues have indeed become more predictable. We provide further support for this observation by showing that inequality between teams has increased and the home-field advantage has been vanishing ubiquitously. We do not include any direct analysis on the effects of monetization on football's predictability or therefore, lack of excitement, however, we propose several hypotheses which could be tested in future analyses. △ Less

Submitted 6 July, 2022; v1 submitted 23 August, 2019; originally announced August 2019.

Comments: revised version - before publication in Royal Society Open Sciecne

Journal ref: Royal Society Open Science, 8(12), 210617 (2021)

arXiv:1908.08859 [pdf, other]

doi 10.1007/s41109-021-00379-2

Dissent and Rebellion in the House of Commons: A Social Network Analysis of Brexit-Related Divisions in the 57$^{ th}$ Parliament

Authors: Carla Intal, Taha Yasseri

Abstract: The British party system is known for its discipline and cohesion, but it remains wedged on one issue: European integration. We offer a methodology using social network analysis that considers the individual interactions of MPs in the voting process. Using public Parliamentary records, we scraped votes of individual MPs in the 57th parliament (June 2017 to April 2019), computed pairwise similarity… ▽ More The British party system is known for its discipline and cohesion, but it remains wedged on one issue: European integration. We offer a methodology using social network analysis that considers the individual interactions of MPs in the voting process. Using public Parliamentary records, we scraped votes of individual MPs in the 57th parliament (June 2017 to April 2019), computed pairwise similarity scores and calculated rebellion metrics based on eigenvector centralities. Comparing the networks of Brexit- and non-Brexit divisions, our methodology was able to detect a significant difference in eurosceptic behaviour for the former, and using a rebellion metric we predicted how MPs would vote in a forthcoming Brexit deal with over 90% accuracy. △ Less

Submitted 27 May, 2021; v1 submitted 23 August, 2019; originally announced August 2019.

Comments: Published

Journal ref: Appl Netw Sci 6, 36 (2021)

arXiv:1907.01536 [pdf]

doi 10.1007/s11077-020-09395-y

What, When and Where of petitions submitted to the UK Government during a time of chaos

Authors: Bertie Vidgen, Taha Yasseri

Abstract: In times marked by political turbulence and uncertainty, as well as increasing divisiveness and hyperpartisanship, Governments need to use every tool at their disposal to understand and respond to the concerns of their citizens. We study issues raised by the UK public to the Government during 2015-2017 (surrounding the UK EU-membership referendum), mining public opinion from a dataset of 10,950 pe… ▽ More In times marked by political turbulence and uncertainty, as well as increasing divisiveness and hyperpartisanship, Governments need to use every tool at their disposal to understand and respond to the concerns of their citizens. We study issues raised by the UK public to the Government during 2015-2017 (surrounding the UK EU-membership referendum), mining public opinion from a dataset of 10,950 petitions (representing 30.5 million signatures). We extract the main issues with a ground-up natural language processing (NLP) method, latent Dirichlet allocation (LDA). We then investigate their temporal dynamics and geographic features. We show that whilst the popularity of some issues is stable across the two years, others are highly influenced by external events, such as the referendum in June 2016. We also study the relationship between petitions' issues and where their signatories are geographically located. We show that some issues receive support from across the whole country but others are far more local. We then identify six distinct clusters of constituencies based on the issues which constituents sign. Finally, we validate our approach by comparing the petitions' issues with the top issues reported in Ipsos MORI survey data. These results show the huge power of computationally analyzing petitions to understand not only what issues citizens are concerned about but also when and from where. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: Preprint; under review

Journal ref: Policy Sci 53, 535-557 (2020)

arXiv:1904.06310 [pdf, other]

Female scholars need to achieve more for equal public recognition

Authors: Menno H. Schellekens, Floris Holstege, Taha Yasseri

Abstract: Different kinds of "gender gap" have been reported in different walks of the scientific life, almost always favouring male scientists over females. In this work, for the first time, we present a large-scale empirical analysis to ask whether female scientists with the same level of scientific accomplishment are as likely as males to be recognised. We particularly focus on Wikipedia, the open online… ▽ More Different kinds of "gender gap" have been reported in different walks of the scientific life, almost always favouring male scientists over females. In this work, for the first time, we present a large-scale empirical analysis to ask whether female scientists with the same level of scientific accomplishment are as likely as males to be recognised. We particularly focus on Wikipedia, the open online encyclopedia that its open nature allows us to have a proxy of community recognition. We calculate the probability of appearing on Wikipedia as a scientist for both male and female scholars in three different fields. We find that women in Physics, Economics and Philosophy are considerable less likely than men to be recognised on Wikipedia across all levels of achievement. △ Less

Submitted 16 April, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

Comments: Under review

arXiv:1810.05485 [pdf, other]

doi 10.1098/rsos.182103

Social capital predicts corruption risk in towns

Authors: Johannes Wachs, Taha Yasseri, Balázs Lengyel, János Kertész

Abstract: Corruption is a social plague: gains accrue to small groups, while its costs are borne by everyone. Significant variation in its level between and within countries suggests a relationship between social structure and the prevalence of corruption, yet, large scale empirical studies thereof have been missing due to lack of data. In this paper we relate the structural characteristics of social capita… ▽ More Corruption is a social plague: gains accrue to small groups, while its costs are borne by everyone. Significant variation in its level between and within countries suggests a relationship between social structure and the prevalence of corruption, yet, large scale empirical studies thereof have been missing due to lack of data. In this paper we relate the structural characteristics of social capital of towns with corruption in their local governments. Using datasets from Hungary, we quantify corruption risk by suppressed competition and lack of transparency in the town's awarded public contracts. We characterize social capital using social network data from a popular online platform. Controlling for social, economic, and political factors, we find that settlements with fragmented social networks, indicating an excess of \textit{bonding social capital} have higher corruption risk and towns with more diverse external connectivity, suggesting a surplus of \textit{bridging social capital} are less exposed to corruption. We interpret fragmentation as fostering in-group favoritism and conformity, which increase corruption, while diversity facilitates impartiality in public life and stifles corruption. △ Less

Submitted 12 October, 2018; originally announced October 2018.

Comments: Submitted

Journal ref: Royal Society Open Science, 2019

arXiv:1809.10032 [pdf]

doi 10.1007/s42001-021-00132-w

Computational Courtship: Understanding the Evolution of Online Dating through Large-scale Data Analysis

Authors: Rachel Dinh, Patrick Gildersleve, Chris Blex, Taha Yasseri

Abstract: Have we become more tolerant of dating people of different social backgrounds compared to ten years ago? Has the rise of online dating exacerbated or alleviated gender inequalities in modern courtship? Are the most attractive people on these platforms necessarily the most successful? In this work, we examine the mate preferences and communication patterns of male and female users of the online dat… ▽ More Have we become more tolerant of dating people of different social backgrounds compared to ten years ago? Has the rise of online dating exacerbated or alleviated gender inequalities in modern courtship? Are the most attractive people on these platforms necessarily the most successful? In this work, we examine the mate preferences and communication patterns of male and female users of the online dating site eHarmony over the past decade to identify how attitudes and behaviors have changed over this time period. While other studies have investigated disparities in user behavior between male and female users, this study is unique in its longitudinal approach. Specifically, we analyze how men and women differ in their preferences for certain traits in potential partners and how those preferences have changed over time. The second line of inquiry investigates to what extent physical attractiveness determines the rate of messages a user receives, and how this relationship varies between men and women. Thirdly, we explore whether online dating practices between males and females have become more equal over time or if biases and inequalities have remained constant (or increased). Fourthly, we study the behavioural traits in sending and replying to messages based on one's own experience of receiving messages and being replied to. Finally, we found that similarity between profiles is not a predictor for success except for the number of children and smoking habits. This work could have broader implications for shifting gender norms and social attitudes, reflected in online courtship rituals. Apart from the data-based research, we connect the results to existing theories that concern the role of ICTs in societal change. As searching for love online becomes increasingly common across generations and geographies, these findings may shed light on how people can build relationships through the Internet. △ Less

Submitted 28 June, 2020; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: Preprint, under review

Journal ref: J Comput Soc Sc 5, 401-426 (2022)

arXiv:1711.10380 [pdf]

Social Media, Money, and Politics: Campaign Finance in the 2016 US Congressional Cycle

Authors: Lily McElwee, Taha Yasseri

Abstract: With social media penetration deepening among both citizens and political figures, there is a pressing need to understand whether and how political use of major platforms is electorally influential. Particularly, the literature focused on campaign usage is thin and often describe the engagement strategies of politicians or attempt to quantify the impact of social media engagement on political lear… ▽ More With social media penetration deepening among both citizens and political figures, there is a pressing need to understand whether and how political use of major platforms is electorally influential. Particularly, the literature focused on campaign usage is thin and often describe the engagement strategies of politicians or attempt to quantify the impact of social media engagement on political learning, participation, or voting. Few have considered implications for campaign fundraising despite its recognized importance in American politics. This paper is the first to quantify a financial payoff for social media campaigning. Drawing on candidate-level data from Facebook and Twitter, Google Trends, Wikipedia page views, and Federal Election Commission (FEC) donation records, we analyze the relationship between the topic and volume of social media content and campaign funds received by all 108 candidates in the 2016 US Senate general elections. By applying an unsupervised learning approach to identify themes in candidate content across the platforms, we find that more frequent posting overall and of issue-related content are associated with higher donation income when controlling for incumbency, state population, and information-seeking about a candidate, though campaigning-related content has a stronger effect than the latter when the number rather than value of donations is considered. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: Under review. Main article + Supplementary Information

arXiv:1711.09074 [pdf]

doi 10.3389/fdigh.2018.00028

Topic Modelling of Everyday Sexism Project Entries

Authors: Sophie Melville, Kathryn Eccles, Taha Yasseri

Abstract: The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this… ▽ More The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this paper, we take a computational approach to analyze the content of reports. We use topic-modelling techniques to extract emerging topics and concepts from the reports, and to map the semantic relations between those topics. The resulting picture closely resembles and adds to that arrived at through qualitative analysis, showing that this form of topic modeling could be useful for sifting through datasets that had not previously been subject to any analysis. More precisely, we come up with a map of topics for two different resolutions of our topic model and discuss the connection between the identified topics. In the low resolution picture, for instance, we found Public space/Street, Online, Work related/Office, Transport, School, Media harassment, and Domestic abuse. Among these, the strongest connection is between Public space/Street harassment and Domestic abuse and sexism in personal relationships.The strength of the relationships between topics illustrates the fluid and ubiquitous nature of sexism, with no single experience being unrelated to another. △ Less

Submitted 5 April, 2018; v1 submitted 24 November, 2017; originally announced November 2017.

Comments: preprint, under review

Journal ref: Front. Digit. Humanit. 5:28 (2019)

arXiv:1711.05701 [pdf]

doi 10.1016/j.socnet.2019.10.005

Social Complex Contagion in Music Listenership: A Natural Experiment with 1.3 Million Participants

Authors: John Ternovski, Taha Yasseri

Abstract: Can live music events generate complex contagion in music streaming? This paper finds evidence in the affirmative, but only for the most popular artists. We generate a novel dataset from Last.fm, a music tracking website, to analyse the listenership history of 1.3 million users over a two-month time horizon. We use daily play counts along with event attendance data to run a regression discontinuit… ▽ More Can live music events generate complex contagion in music streaming? This paper finds evidence in the affirmative, but only for the most popular artists. We generate a novel dataset from Last.fm, a music tracking website, to analyse the listenership history of 1.3 million users over a two-month time horizon. We use daily play counts along with event attendance data to run a regression discontinuity analysis in order to show the causal impact of concert attendance on music listenership among attendees and their friends network. First, we show that attending a music artist's live concert increases that artist's listenership among the attendees of the concert by approximately 1 song per day per attendee (p-value<0.001). Moreover, we show that this effect is contagious and can spread to users who did not attend the event. However, the extent of contagion depends on the type of artist. We only observe contagious increases in listenership for well-established, popular artists (.06 more daily plays per friend of an attendee [p<0.001]), while the effect is absent for emerging stars. We also show that the contagion effect size increases monotonically with the number of friends who have attended the live event. △ Less

Submitted 15 November, 2017; originally announced November 2017.

Comments: Preprint, under review

Journal ref: Social Networks, Volume 61, 144-152, 2020

arXiv:1710.03326 [pdf, other]

doi 10.1007/978-3-319-73198-8_23

Inspiration, Captivation, and Misdirection: Emergent Properties in Networks of Online Navigation

Authors: Patrick Gildersleve, Taha Yasseri

Abstract: The World Wide Web (WWW) has fundamentally changed the ways billions of people are able to access information. Thus, understanding how people seek information online is an important issue of study. Wikipedia is a hugely important part of information provision on the web, with hundreds of millions of users browsing and contributing to its network of knowledge. The study of navigational behaviour on… ▽ More The World Wide Web (WWW) has fundamentally changed the ways billions of people are able to access information. Thus, understanding how people seek information online is an important issue of study. Wikipedia is a hugely important part of information provision on the web, with hundreds of millions of users browsing and contributing to its network of knowledge. The study of navigational behaviour on Wikipedia, due to the site's popularity and breadth of content, can reveal more general information seeking patterns that may be applied beyond Wikipedia and the Web. Our work addresses the relative shortcomings of existing literature in relating how information structure influences patterns of navigation online. We study aggregated clickstream data for articles on the English Wikipedia in the form of a weighted, directed navigational network. We introduce two parameters that describe how articles act to source and spread traffic through the network, based on their in/out strength and entropy. From these, we construct a navigational phase space where different article types occupy different, distinct regions, indicating how the structure of information online has differential effects on patterns of navigation. Finally, we go on to suggest applications for this analysis in identifying and correcting deficiencies in the Wikipedia page network that may also be adapted to more general information networks. △ Less

Submitted 12 October, 2017; v1 submitted 9 October, 2017; originally announced October 2017.

Journal ref: CompleNet 2018. Springer Proceedings in Complexity. Springer, Cham

arXiv:1703.08781 [pdf, ps, other]

Emergence of world-stock-market network

Authors: M. Saeedian, T. Jamali, M. Z. Kamali, H. Bayani, T. Yasseri, G. R. Jafari

Abstract: In the age of globalization, it is natural that the stock market of each country is not independent form the other markets. In this case, collective behavior could be emerged form their dependency together. This article studies the collective behavior of a set of forty influential markets in the world economy with the aim of exploring a global financial structure that could be called world-stock-m… ▽ More In the age of globalization, it is natural that the stock market of each country is not independent form the other markets. In this case, collective behavior could be emerged form their dependency together. This article studies the collective behavior of a set of forty influential markets in the world economy with the aim of exploring a global financial structure that could be called world-stock-market network. Towards this end, we analyze the cross-correlation matrix of the indices of these forty markets using Random Matrix Theory (RMT). We find the degree of collective behavior among the markets and the share of each market in their structural formation. This finding together with the results obtained from the same calculation on four stock markets reinforce the idea of a world financial market. Finally, we draw the dendrogram of the cross-correlation matrix to make communities in this abstract global market visible. The dendrogram, drawn by at least thirty percent of correlation, shows that the world financial market comprises three communities each of which includes stock markets with geographical proximity. △ Less

Submitted 26 March, 2017; originally announced March 2017.

arXiv:1609.04285 [pdf]

doi 10.1371/journal.pone.0171774

Even Good Bots Fight: The Case of Wikipedia

Authors: Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, Taha Yasseri

Abstract: In recent years, there has been a huge increase in the number of bots online, varying from Web crawlers for search engines, to chatbots for online customer service, spambots on social media, and content-editing bots in online collaboration communities. The online world has turned into an ecosystem of bots. However, our knowledge of how these automated agents are interacting with each other is rath… ▽ More In recent years, there has been a huge increase in the number of bots online, varying from Web crawlers for search engines, to chatbots for online customer service, spambots on social media, and content-editing bots in online collaboration communities. The online world has turned into an ecosystem of bots. However, our knowledge of how these automated agents are interacting with each other is rather poor. Bots are predictable automatons that do not have the capacity for emotions, meaning-making, creativity, and sociality and it is hence natural to expect interactions between bots to be relatively predictable and uneventful. In this article, we analyze the interactions between bots that edit articles on Wikipedia. We track the extent to which bots undid each other's edits over the period 2001-2010, model how pairs of bots interact over time, and identify different types of interaction trajectories. We find that, although Wikipedia bots are intended to support the encyclopedia, they often undo each other's edits and these sterile "fights" may sometimes continue for years. Unlike humans on Wikipedia, bots' interactions tend to occur over longer periods of time and to be more reciprocated. Yet, just like humans, bots in different cultural environments may behave differently. Our research suggests that even relatively "dumb" bots may give rise to complex interactions, and this carries important implications for Artificial Intelligence research. Understanding what affects bot-bot interactions is crucial for managing social media well, providing adequate cyber-security, and designing well functioning autonomous vehicles. △ Less

Submitted 27 February, 2017; v1 submitted 14 September, 2016; originally announced September 2016.

Comments: Published in PLOS ONE

Journal ref: PLoS ONE (2017) 12(2):e0171774

arXiv:1609.02621 [pdf, ps, other]

doi 10.1126/sciadv.1602368

Memory Remains: Understanding Collective Memory in the Digital Age

Authors: Ruth García-Gavilanes, Anders Mollgaard, Milena Tsvetkova, Taha Yasseri

Abstract: Recently developed information communication technologies, particularly the Internet, have affected how we, both as individuals and as a society, create, store, and recall information. Internet also provides us with a great opportunity to study memory using transactional large scale data, in a quantitative framework similar to the practice in statistical physics. In this project, we make use of on… ▽ More Recently developed information communication technologies, particularly the Internet, have affected how we, both as individuals and as a society, create, store, and recall information. Internet also provides us with a great opportunity to study memory using transactional large scale data, in a quantitative framework similar to the practice in statistical physics. In this project, we make use of online data by analysing viewership statistics of Wikipedia articles on aircraft crashes. We study the relation between recent events and past events and particularly focus on understanding memory triggering patterns. We devise a quantitative model that explains the flow of viewership from a current event to past events based on similarity in time, geography, topic, and the hyperlink structure of Wikipedia articles. We show that on average the secondary flow of attention to past events generated by such remembering processes is larger than the primary attention flow to the current event. We are the first to report these cascading effects. △ Less

Submitted 8 September, 2016; originally announced September 2016.

Comments: Under Review

Journal ref: Science Advances 3(4), 2017

arXiv:1607.08127 [pdf, other]

doi 10.1371/journal.pone.0173561

Understanding and co** with extremism in an online collaborative environment

Authors: Csilla Rudas, Olivér Surányi, Taha Yasseri, János Török

Abstract: The Internet has provided us with great opportunities for large scale collaborative public good projects. Wikipedia is a predominant example of such projects where conflicts emerge and get resolved through bottom-up mechanisms leading to the emergence of the largest encyclopedia in human history. Disaccord arises whenever editors with different opinions try to produce an article reflecting a conse… ▽ More The Internet has provided us with great opportunities for large scale collaborative public good projects. Wikipedia is a predominant example of such projects where conflicts emerge and get resolved through bottom-up mechanisms leading to the emergence of the largest encyclopedia in human history. Disaccord arises whenever editors with different opinions try to produce an article reflecting a consensual view. The debates are mainly heated by editors with extremist views. Using a model of common value production, we show that the consensus can only be reached if extremist groups can actively take part in the discussion and if their views are also represented in the common outcome, at least temporarily. We show that banning problematic editors mostly hinders the consensus as it delays discussion and thus the whole consensus building process. To validate the model, relevant quantities are measured both in simulations and Wikipedia which show satisfactory agreement. We also consider the role of direct communication between editors both in the model and in Wikipedia data (by analysing the Wikipedia {\it talk} pages). While the model suggests that in certain conditions there is an optimal rate of "talking" vs "editing", it correctly predicts that in the current settings of Wikipedia, more activity in talk pages is associated with more controversy. △ Less

Submitted 27 July, 2016; originally announced July 2016.

Comments: 16 pages, 9 figures

arXiv:1607.07495 [pdf]

doi 10.1002/9781118998205.ch12

Understanding Communication Patterns in MOOCs: Combining Data Mining and qualitative methods

Authors: Rebecca Eynon, Isis Hjorth, Taha Yasseri, Nabeel Gillani

Abstract: Massive Open Online Courses (MOOCs) offer unprecedented opportunities to learn at scale. Within a few years, the phenomenon of crowd-based learning has gained enormous popularity with millions of learners across the globe participating in courses ranging from Popular Music to Astrophysics. They have captured the imaginations of many, attracting significant media attention - with The New York Times… ▽ More Massive Open Online Courses (MOOCs) offer unprecedented opportunities to learn at scale. Within a few years, the phenomenon of crowd-based learning has gained enormous popularity with millions of learners across the globe participating in courses ranging from Popular Music to Astrophysics. They have captured the imaginations of many, attracting significant media attention - with The New York Times naming 2012 "The Year of the MOOC." For those engaged in learning analytics and educational data mining, MOOCs have provided an exciting opportunity to develop innovative methodologies that harness big data in education. △ Less

Submitted 25 July, 2016; originally announced July 2016.

Comments: Preprint of a chapter to appear in "Data Mining and Learning Analytics: Applications in Educational Research"

arXiv:1607.03320 [pdf]

What Happens After You Both Swipe Right: A Statistical Description of Mobile Dating Communications

Authors: Jennie Zhang, Taha Yasseri

Abstract: Mobile dating applications (MDAs) have skyrocketed in popularity in the last few years, with popular MDA Tinder alone matching 26 million pairs of users per day. In addition to becoming an influential part of modern dating culture, MDAs facilitate a unique form of mediated communication: dyadic mobile text messages between pairs of users who are not already acquainted. Furthermore, mobile dating h… ▽ More Mobile dating applications (MDAs) have skyrocketed in popularity in the last few years, with popular MDA Tinder alone matching 26 million pairs of users per day. In addition to becoming an influential part of modern dating culture, MDAs facilitate a unique form of mediated communication: dyadic mobile text messages between pairs of users who are not already acquainted. Furthermore, mobile dating has paved the way for analysis of these digital interactions via massive sets of data generated by the instant matching and messaging functions of its many platforms at an unprecedented scale. This paper looks at one of these sets of data: metadata of approximately two million conversations, containing 19 million messages, exchanged between 400,000 heterosexual users on an MDA. Through computational analysis methods, this study offers the very first large scale quantitative depiction of mobile dating as a whole. We report on differences in how heterosexual male and female users communicate with each other on MDAs, differences in behaviors of dyads of varying degrees of social separation, and factors leading to "success"-operationalized by the exchange of phone numbers between a match. For instance, we report that men initiate 79% of conversations--and while about half of the initial messages are responded to, conversations initiated by men are more likely to be reciprocated. We also report that the length of conversations, the waiting times, and the length of messages have fat-tailed distributions. That said, the majority of reciprocated conversations lead to a phone number exchange within the first 20 messages. △ Less

Submitted 12 July, 2016; originally announced July 2016.

Comments: Under Review, 22 pages, 8 tables, 8 figures

arXiv:1606.08829 [pdf, other]

doi 10.1098/rsos.160460

Dynamics and Biases of Online Attention: The Case of Aircraft Crashes

Authors: Ruth García-Gavilanes, Milena Tsvetkova, Taha Yasseri

Abstract: The Internet not only has changed the dynamics of our collective attention, but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the… ▽ More The Internet not only has changed the dynamics of our collective attention, but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the editorial activities on and the viewership of the articles about airline crashes. We analyse how the level of attention is influenced by different parameters such as number of deaths, airline region, and event locale and date. We find evidence that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal. We discuss the implications of these findings in the context of attention bias. △ Less

Submitted 11 September, 2016; v1 submitted 28 June, 2016; originally announced June 2016.

Comments: Accepted for publication in Royal Society Open Science

Journal ref: R. Soc. Open Sci. 2016 3 160460 (12 October 2016)

arXiv:1605.05139 [pdf]

doi 10.3389/fdigh.2017.00011

Two Roads Diverged: A Semantic Network Analysis of Guanxi on Twitter

Authors: Pu Yan, Taha Yasseri

Abstract: Guanxi, roughly translated as "social connection", is a term commonly used in the Chinese language. In this research, we employed a linguistic approach to explore popular discourses on Guanxi. Although sharing the same Confucian roots, Chinese communities inside and outside Mainland China have undergone different historical trajectories. Hence, we took a comparative approach to examine guanxi in M… ▽ More Guanxi, roughly translated as "social connection", is a term commonly used in the Chinese language. In this research, we employed a linguistic approach to explore popular discourses on Guanxi. Although sharing the same Confucian roots, Chinese communities inside and outside Mainland China have undergone different historical trajectories. Hence, we took a comparative approach to examine guanxi in Mainland China and in Taiwan, Hong Kong, and Macau (TW-HK-M). Comparing guanxi discourses in two Chinese societies aims at revealing the divergence of guanxi culture. The data for this research were collected on Twitter over a three-week period by searching tweets containing guanxi written in Simplified Chinese characters and in Traditional Chinese characters. After building, visualising, and conducting community detection on both semantic networks, two guanxi discourses were then compared in terms of their major concept sub-communities. This research aims at addressing two questions: Has the meaning of guanxi transformed in contemporary Chinese societies? And how do different socio-economic configurations affect the practice of guanxi? Results suggest that guanxi in interpersonal relationships has adapted to a new family structure in both Chinese societies. In addition, the practice of guanxi in business varies in Mainland China and in TW-HK-M. Furthermore, an extended domain was identified where guanxi is used in a macro-level discussion of state relations. Network representations of the guanxi discourses enabled reification of the concept and shed lights on the understanding of social connections and social orders in contemporary China. △ Less

Submitted 17 May, 2016; originally announced May 2016.

Comments: under review. 29 pages + supplementary information

Journal ref: Front. Digit. Humanit. 4:11, 2017

arXiv:1605.04774 [pdf, ps, other]

doi 10.3389/fphy.2016.00034

A Biased Review of Biases in Twitter Studies on Political Collective Action

Authors: Peter Cihon, Taha Yasseri

Abstract: In recent years researchers have gravitated to social media platforms, especially Twitter, as fertile ground for empirical analysis of social phenomena. Social media provides researchers access to trace data of interactions and discourse that once went unrecorded in the offline world. Researchers have sought to use these data to explain social phenomena both particular to social media and applicab… ▽ More In recent years researchers have gravitated to social media platforms, especially Twitter, as fertile ground for empirical analysis of social phenomena. Social media provides researchers access to trace data of interactions and discourse that once went unrecorded in the offline world. Researchers have sought to use these data to explain social phenomena both particular to social media and applicable to the broader social world. This paper offers a minireview of Twitter-based research on political crowd behavior. This literature offers insight into particular social phenomena on Twitter, but often fails to use standardized methods that permit interpretation beyond individual studies. Moreover, the literature fails to ground methodologies and results in social or political theory, divorcing empirical research from the theory needed to interpret it. Rather, papers focus primarily on methodological innovations for social media analyses, but these too often fail to sufficiently demonstrate the validity of such methodologies. This minireview considers a small number of selected papers; we analyze their (often lack of) theoretical approaches, review their methodological innovations, and offer suggestions as to the relevance of their results for political scientists and sociologists. △ Less

Submitted 16 May, 2016; originally announced May 2016.

Comments: Mini-review paper, 10 pages. Draft under review

Journal ref: Front. Phys. 4:34, 2016

arXiv:1602.01652 [pdf]

doi 10.1038/srep36333

Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in Online Collaboration

Authors: Milena Tsvetkova, Ruth García-Gavilanes, Taha Yasseri

Abstract: Disagreement and conflict are a fact of social life and considerably affect our well-being and productivity. Such negative interactions are rarely explicitly declared and recorded and this makes them hard for scientists to study. We overcome this challenge by investigating the patterns in the timing and configuration of contributions to a large online collaboration community. We analyze sequences… ▽ More Disagreement and conflict are a fact of social life and considerably affect our well-being and productivity. Such negative interactions are rarely explicitly declared and recorded and this makes them hard for scientists to study. We overcome this challenge by investigating the patterns in the timing and configuration of contributions to a large online collaboration community. We analyze sequences of reverts of contributions to Wikipedia, the largest online encyclopedia, and investigate how often and how fast they occur compared to a null model that randomizes the order of actions to remove any systematic clustering. We find evidence that individuals systematically attack the same person and attack back their attacker; both of these interactions occur at a faster response rate than expected. We also establish that individuals come to defend an attack victim but we do not find evidence that attack victims "pay it forward" or that attackers collude to attack the same individual. We further find that high-status contributors are more likely to attack many others serially, status equals are more likely to revenge attacks back, while attacks by lower-status contributors trigger attacks forward; yet, it is the lower-status contributors who also come forward to defend third parties. The method we use can be applied to other large-scale temporal communication and collaboration networks to identify the existence of negative social interactions and other social processes. △ Less

Submitted 26 October, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

Comments: Forthcoming in Scientific Reports

Journal ref: Scientific Reports (2016) 6:36333

arXiv:1601.06805 [pdf, other]

doi 10.3389/fphy.2016.00006

P-values: misunderstood and misused

Authors: Bertie Vidgen, Taha Yasseri

Abstract: P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made the p-value an even more popular tool to test the significance of a study. However, substantial literature has been produced critiquing how p-values are used and understood. In this paper we review this recent critical literat… ▽ More P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made the p-value an even more popular tool to test the significance of a study. However, substantial literature has been produced critiquing how p-values are used and understood. In this paper we review this recent critical literature, much of which is routed in the life sciences, and consider its implications for social scientific research. We provide a coherent picture of what the main criticisms are, and draw together and disambiguate common themes. In particular, we explain how the False Discovery Rate is calculated, and how this differs from a p-value. We also make explicit the Bayesian nature of many recent criticisms, a dimension that is often underplayed or ignored. We conclude by identifying practical steps to help remediate some of the concerns identified. We recommend that (i) far lower significance levels are used, such as $0.01$ or $0.001$, and (ii) p-values are interpreted contextually, and situated within both the findings of the individual study and the broader field of inquiry (through, for example, meta-analyses). △ Less

Submitted 10 March, 2016; v1 submitted 25 January, 2016; originally announced January 2016.

Comments: Published in Frontiers in Physics: Vidgen B and Yasseri T (2016) P-Values: Misunderstood and Misused. Front. Phys. 4:6

Journal ref: Front. Phys. 4:6, 2016

arXiv:1508.01409 [pdf, ps, other]

doi 10.1063/1.4998436

Crossing Statistics of Anisotropic Stochastic Surface

Authors: M. Ghasemi Nezhadhaghighi, S. M. S. Movahed, T. Yasseri, S. M. Vaez Allaei

Abstract: In this paper, we propose crossing statistics and its generalization, as a new framework to characterize the anisotropy in a 2D field, e.g. height on a surface, extendable to higher dimensions. By measuring $ν^+$, the number of up-crossing (crossing points with positive slope at a given threshold of height ($α$)), and $N_{tot}$ (the generalized roughness function), it is possible to distinguish th… ▽ More In this paper, we propose crossing statistics and its generalization, as a new framework to characterize the anisotropy in a 2D field, e.g. height on a surface, extendable to higher dimensions. By measuring $ν^+$, the number of up-crossing (crossing points with positive slope at a given threshold of height ($α$)), and $N_{tot}$ (the generalized roughness function), it is possible to distinguish the nature of anisotropy, rotational invariance and Gaussianity of any given surface. For the case of anisotropic correlated self- or multi-affine surfaces (even with different correlation lengths in various directions and/or directional scaling exponents), we analytically derive some relations between $ν^+$ and $N_{tot}$ with corresponding scaling parameters. The method systematically distinguishes the directions of anisotropy, at $3σ$ confidence interval using P-value statistics. After applying a typical method in determining the corresponding scaling exponents in identified anisotropic directions, we are able to determine the kind and ratio of correlation length anisotropy. To demonstrate capability and accuracy of the method, as well validity of analytical relations, our proposed measures are calculated on synthetic stochastic rough interfaces and rough interfaces generated from simulation of ion etching. There are good consistencies between analytical and numerical computations. The proposed algorithm can be mounted with a simple software on various instruments for surface analysis and characterization, such as AFM, STM and etc. △ Less

Submitted 19 November, 2016; v1 submitted 6 August, 2015; originally announced August 2015.

Comments: 13 pages and 11 figures, major revision and add some new references, submitted to Phys. Rev. B

Journal ref: Journal of Applied Physics 122, 085302 (2017)

arXiv:1505.01818 [pdf]

doi 10.1140/epjds/s13688-016-0083-3

Wikipedia traffic data and electoral prediction: towards theoretically informed models

Authors: Taha Yasseri, Jonathan Bright

Abstract: This aim of this article is to explore the potential use of Wikipedia page view data for predicting electoral results. Responding to previous critiques of work using socially generated data to predict elections, which have argued that these predictions take place without any understanding of the mechanism which enables them, we first develop a theoretical model which highlights why people might se… ▽ More This aim of this article is to explore the potential use of Wikipedia page view data for predicting electoral results. Responding to previous critiques of work using socially generated data to predict elections, which have argued that these predictions take place without any understanding of the mechanism which enables them, we first develop a theoretical model which highlights why people might seek information online at election time, and how this activity might relate to overall electoral outcomes, focussing especially on how different types of parties such as new and established parties might generate different information seeking patterns. We test this model on a novel dataset drawn from a variety of countries in the 2009 and 2014 European Parliament elections. We show that while Wikipedia offers little insight into absolute vote outcomes, it offers a good information about changes in both overall turnout at elections and in vote share for particular parties. These results are used to enhance existing theories about the drivers of aggregate patterns in online information seeking. △ Less

Submitted 22 January, 2016; v1 submitted 5 May, 2015; originally announced May 2015.

Comments: submitted to EPJ Data Science. Additional File 1 available at https://drive.google.com/open?id=0BxaGC-YCTO6SWkJhRXlrMVRYVlE

Journal ref: EPJ Data Science, 5: 22 (2016)

arXiv:1411.3662 [pdf]

doi 10.1038/srep06447

Structural limitations of learning in a crowd: communication vulnerability and information diffusion in MOOCs

Authors: Nabeel Gillani, Taha Yasseri, Rebecca Eynon, Isis Hjorth

Abstract: Massive Open Online Courses (MOOCs) bring together a global crowd of thousands of learners for several weeks or months. In theory, the openness and scale of MOOCs can promote iterative dialogue that facilitates group cognition and knowledge construction. Using data from two successive instances of a popular business strategy MOOC, we filter observed communication patterns to arrive at the "signifi… ▽ More Massive Open Online Courses (MOOCs) bring together a global crowd of thousands of learners for several weeks or months. In theory, the openness and scale of MOOCs can promote iterative dialogue that facilitates group cognition and knowledge construction. Using data from two successive instances of a popular business strategy MOOC, we filter observed communication patterns to arrive at the "significant" interaction networks between learners and use complex network analysis to explore the vulnerability and information diffusion potential of the discussion forums. We find that different discussion topics and pedagogical practices promote varying levels of 1) "significant" peer-to-peer engagement, 2) participant inclusiveness in dialogue, and ultimately, 3) modularity, which impacts information diffusion to prevent a truly "global" exchange of knowledge and learning. These results indicate the structural limitations of large-scale crowd-based learning and highlight the different ways that learners in MOOCs leverage, and learn within, social contexts. We conclude by exploring how these insights may inspire new developments in online education. △ Less

Submitted 13 November, 2014; originally announced November 2014.

Comments: Pre-print version. Published version available at http://dx.doi.org/10.1038/srep06447

Journal ref: Sci Rep 4, 6447 (2014)

arXiv:1408.3562 [pdf]

doi 10.1371/journal.pone.0196068

Investigating Political Participation and Social Information Using Big Data and a Natural Experiment

Authors: Scott A. Hale, Peter John, Helen Margetts, Taha Yasseri

Abstract: Social information is particularly prominent in digital settings where the design of platforms can more easily give real-time information about the behaviour of peers and reference groups and thereby stimulate political activity. Changes to these platforms can generate natural experiments allowing an assessment of the impact of changes in social information and design on participation. This paper… ▽ More Social information is particularly prominent in digital settings where the design of platforms can more easily give real-time information about the behaviour of peers and reference groups and thereby stimulate political activity. Changes to these platforms can generate natural experiments allowing an assessment of the impact of changes in social information and design on participation. This paper investigates the impact of the introduction of trending information on the homepage of the UK government petitions platform. Using interrupted time series and a regression discontinuity design, we find that the introduction of the trending feature had no statistically significant effect on the overall number of signatures per day, but that the distribution of signatures across petitions changes: the most popular petitions gain even more signatures at the expense of those with less signatories. We find significant differences between petitions trending at different ranks, even after controlling for each petition's individual growth prior to trending. The findings suggest a non-negligible group of individuals visit the homepage of the site looking for petitions to sign and therefore see the list of trending petitions, and a significant proportion of this group responds to the social information that it provides. These findings contribute to our understanding of how social information, and the form in which it is presented, affects individual political behaviour in digital settings. △ Less

Submitted 15 August, 2014; originally announced August 2014.

Comments: Prepared for delivery at the 2014 Annual Meeting of the American Political Science Association, August 28-31, 2014

Journal ref: PLOS ONE 13(4): e0196068 (2018)

arXiv:1405.2856 [pdf, other]

doi 10.1145/2615569.2615691

Map** the UK Webspace: Fifteen Years of British Universities on the Web

Authors: Scott A. Hale, Taha Yasseri, Josh Cowls, Eric T. Meyer, Ralph Schroeder, Helen Margetts

Abstract: This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time.… ▽ More This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research. △ Less

Submitted 12 May, 2014; originally announced May 2014.

Comments: To appear in the proceeding of WebSci 2014

Journal ref: Proceedings of the 2014 ACM conference on Web science (WebSci '14). Association for Computing Machinery, New York, NY, USA, 62-70

arXiv:1403.3568 [pdf, other]

doi 10.1140/epjds/s13688-014-0007-z

Modeling Social Dynamics in a Collaborative Environment

Authors: Gerardo Iñiguez, János Török, Taha Yasseri, Kimmo Kaski, János Kertész

Abstract: Wikipedia is a prime example of today's value production in a collaborative environment. Using this example, we model the emergence, persistence and resolution of severe conflicts during collaboration by coupling opinion formation with article editing in a bounded confidence dynamics. The complex social behavior involved in editing articles is implemented as a minimal model with two basic elements… ▽ More Wikipedia is a prime example of today's value production in a collaborative environment. Using this example, we model the emergence, persistence and resolution of severe conflicts during collaboration by coupling opinion formation with article editing in a bounded confidence dynamics. The complex social behavior involved in editing articles is implemented as a minimal model with two basic elements; (i) individuals interact directly to share information and convince each other, and (ii) they edit a common medium to establish their own opinions. Opinions of the editors and that represented by the article are characterised by a scalar variable. When the pool of editors is fixed, three regimes can be distinguished: (a) a stable mainstream article opinion is continuously contested by editors with extremist views and there is slow convergence towards consensus, (b) the article oscillates between editors with extremist views, reaching consensus relatively fast at one of the extremes, and (c) the extremist editors are converted very fast to the mainstream opinion and the article has an erratic evolution. When editors are renewed with a certain rate, a dynamical transition occurs between different kinds of edit wars, which qualitatively reflect the dynamics of conflicts as observed in real Wikipedia data. △ Less

Submitted 14 June, 2014; v1 submitted 14 March, 2014; originally announced March 2014.

Comments: Revised version, to appear in EPJ Data Science; 19 pages 9 figures

Journal ref: EPJ Data Science 3 (1), 7 (2014)

arXiv:1312.2818 [pdf, ps, other]

doi 10.1515/itit-2014-1046

Can electoral popularity be predicted using socially generated big data?

Authors: Taha Yasseri, Jonathan Bright

Abstract: Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of can… ▽ More Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of candidates and parties through the analysis of socially generated data on the web during electoral campaigns. Such data offer considerable possibility for improving our awareness of popularity dynamics. However they also suffer from significant drawbacks in terms of representativeness and generalisability. In this paper we discuss potential ways around such problems, suggesting the nature of different political systems and contexts might lend differing levels of predictive power to certain types of data source. We offer an initial exploratory test of these ideas, focussing on two data streams, Wikipedia page views and Google search queries. On the basis of this data, we present popularity dynamics from real case examples of recent elections in three different countries. △ Less

Submitted 8 August, 2014; v1 submitted 10 December, 2013; originally announced December 2013.

Comments: To appear in Information Technology

Journal ref: it - Information Technology, vol. 56, no. 5, 2014, pp. 246-253

arXiv:1310.8508 [pdf, other]

doi 10.1140/epjds20

The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics

Authors: Anna Samoilenko, Taha Yasseri

Abstract: Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featur… ▽ More Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featured in the world's largest online encyclopedia is correlated with higher academic notability (assessed through citation counts). We found no statistically significant correlation between Wikipedia articles metrics (length, number of edits, number of incoming links from other articles, etc.) and academic notability of the mentioned researchers. We also did not find any evidence that the scientists with better WP representation are necessarily more prominent in their fields. In addition, we inspected the Wikipedia coverage of notable scientists sampled from Thomson Reuters list of "highly cited researchers". In each of the examined fields, Wikipedia failed in covering notable scholars properly. Both findings imply that Wikipedia might be producing an inaccurate image of academics on the front end of science. By shedding light on how public perception of academic progress is formed, this study alerts that a subjective element might have been introduced into the hitherto structured system of academic evaluation. △ Less

Submitted 10 December, 2013; v1 submitted 31 October, 2013; originally announced October 2013.

Comments: To appear in EPJ Data Science. To have the Additional Files and Datasets e-mail the corresponding author

Journal ref: EPJ Data Science 2014, 3:1

arXiv:1308.0239 [pdf, other]

doi 10.1140/epjds/s13688-017-0116-6

Rapid rise and decay in petition signing

Authors: Taha Yasseri, Scott A. Hale, Helen Margetts

Abstract: Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a give… ▽ More Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a given platform. This paper tracks the growth curves of all 20,000 petitions to the UK government petitions website (http://epetitions.direct.gov.uk) and 1,800 petitions to the US White House site (https://petitions.whitehouse.gov), analyzing the rate of growth and outreach mechanism. Previous research has suggested the importance of the first day to the ultimate success of a petition, but has not examined early growth within that day, made possible here through hourly resolution in the data. The analysis shows that the vast majority of petitions do not achieve any measure of success; over 99 percent fail to get the 10,000 signatures required for an official response and only 0.1 percent attain the 100,000 required for a parliamentary debate (0.7 percent in the US). We analyze the data through a multiplicative process model framework to explain the heterogeneous growth of signatures at the population level. We define and measure an average outreach factor for petitions and show that it decays very fast (reducing to 0.1 pervent after 10 hours in the UK and 30 hours in the US). After a day or two, a petition's fate is virtually set. The findings challenge conventional analyses of collective action from economics and political science, where the production function has been assumed to follow an S-shaped curve. △ Less

Submitted 3 January, 2023; v1 submitted 1 August, 2013; originally announced August 2013.

Comments: For the final version see https://link.springer.com/content/pdf/10.1140/epjds/s13688-017-0116-6.pdf

Journal ref: EPJ Data Science (2017) 6:20

arXiv:1305.5566 [pdf]

The most controversial topics in Wikipedia: A multilingual and geographical analysis

Authors: Taha Yasseri, Anselm Spoerri, Mark Graham, János Kertész

Abstract: We present, visualize and analyse the similarities and differences between the controversial topics related to "edit wars" identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top… ▽ More We present, visualize and analyse the similarities and differences between the controversial topics related to "edit wars" identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top 100 lists of most controversial articles in different languages and the content related to geographical locations will be presented. We discuss what the presented analysis and visualizations can tell us about the multicultural aspects of Wikipedia and practices of peer-production. Our results indicate that Wikipedia is more than just an encyclopaedia; it is also a window into convergent and divergent social-spatial priorities, interests and preferences. △ Less

Submitted 8 July, 2013; v1 submitted 23 May, 2013; originally announced May 2013.

Comments: This is a draft of a book chapter to be published in 2014 by Scarecrow Press. Please cite as: Yasseri T., Spoerri A., Graham M., and Kertész J., The most controversial topics in Wikipedia: A multilingual and geographical analysis. In: Fichman P., Hara N., editors, Global Wikipedia:International and cross-cultural issues in online collaboration. Scarecrow Press (2014)

arXiv:1304.2031 [pdf, other]

doi 10.1145/2491055.2491068

Temporal Analysis of Activity Patterns of Editors in Collaborative Map** Project of OpenStreetMap

Authors: Taha Yasseri, Giovanni Quattrone, Afra Mashhadi

Abstract: In the recent years Wikis have become an attractive platform for social studies of the human behaviour. Containing millions records of edits across the globe, collaborative systems such as Wikipedia have allowed researchers to gain a better understanding of editors participation and their activity patterns. However, contributions made to Geo-wikis_wiki-based collaborative map** projects_ differ… ▽ More In the recent years Wikis have become an attractive platform for social studies of the human behaviour. Containing millions records of edits across the globe, collaborative systems such as Wikipedia have allowed researchers to gain a better understanding of editors participation and their activity patterns. However, contributions made to Geo-wikis_wiki-based collaborative map** projects_ differ from systems such as Wikipedia in a fundamental way due to spatial dimension of the content that limits the contributors to a set of those who posses local knowledge about a specific area and therefore cross-platform studies and comparisons are required to build a comprehensive image of online open collaboration phenomena. In this work, we study the temporal behavioural pattern of OpenStreetMap editors, a successful example of geo-wiki, for two European capital cities. We categorise different type of temporal patterns and report on the historical trend within a period of 7 years of the project age. We also draw a comparison with the previously observed editing activity patterns of Wikipedia. △ Less

Submitted 7 April, 2013; originally announced April 2013.

Comments: Submitted

Journal ref: Proceedings of the 9th International Symposium on Open Collaboration (WikiSym '13). Association for Computing Machinery, New York, NY, USA, Article 13, 1-4 (2013)

arXiv:1304.0588 [pdf, other]

doi 10.1145/2464464.2464518

Petition Growth and Success Rates on the UK No. 10 Downing Street Website

Authors: Scott A. Hale, Helen Margetts, Taha Yasseri

Abstract: Now that so much of collective action takes place online, web-generated data can further understanding of the mechanics of Internet-based mobilisation. This trace data offers social science researchers the potential for new forms of analysis, using real-time transactional data based on entire populations, rather than sample-based surveys of what people think they did or might do. This paper uses a… ▽ More Now that so much of collective action takes place online, web-generated data can further understanding of the mechanics of Internet-based mobilisation. This trace data offers social science researchers the potential for new forms of analysis, using real-time transactional data based on entire populations, rather than sample-based surveys of what people think they did or might do. This paper uses a `big data' approach to track the growth of over 8,000 petitions to the UK Government on the No. 10 Downing Street website for two years, analysing the rate of growth per day and testing the hypothesis that the distribution of daily change will be leptokurtic (rather than normal) as previous research on agenda setting would suggest. This hypothesis is confirmed, suggesting that Internet-based mobilisation is characterized by tip** points (or punctuated equilibria) and explaining some of the volatility in online collective action. We find also that most successful petitions grow quickly and that the number of signatures a petition receives on its first day is a significant factor in explaining the overall number of signatures a petition receives during its lifetime. These findings have implications for the strategies of those initiating petitions and the design of web sites with the aim of maximising citizen engagement with policy issues. △ Less

Submitted 2 April, 2013; originally announced April 2013.

Comments: To appear in proceeding of WebSci'13, May 1-5, 2013, Paris, France

Journal ref: WebSci '13 Proceedings of the 5th Annual ACM Web Science Conference, Pages 132-138, 2013

arXiv:1211.0970 [pdf, other]

doi 10.1371/journal.pone.0071226

Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

Authors: Márton Mestyán, Taha Yasseri, János Kertész

Abstract: Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and… ▽ More Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia. △ Less

Submitted 26 June, 2013; v1 submitted 5 November, 2012; originally announced November 2012.

Comments: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zip

Journal ref: PLoS ONE 8(8): e71226 (2013)

arXiv:1208.5130 [pdf, other]

doi 10.1007/s10955-013-0728-6

Value production in a collaborative environment

Authors: Taha Yasseri, János Kertész

Abstract: We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in langua… ▽ More We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in languages spoken in several time zones. Using a recently introduced measure we showed that the editorial activities have intrinsic dependencies in the burstiness of events. A comparison of the English and Simple English WPs revealed important aspects of language complexity and showed how peer cooperation solved the task of enhancing readability. One of our focus issues was characterizing the conflicts or edit wars in WPs, which helped us to automatically filter out controversial pages. When studying the temporal evolution of the controversiality of such pages we identified typical patterns and classified conflicts accordingly. Our quantitative analysis provides the basis of modeling conflicts and their resolution in collaborative environments and contribute to the understanding of this issue, which becomes increasingly important with the development of information communication technology. △ Less

Submitted 14 February, 2013; v1 submitted 25 August, 2012; originally announced August 2012.

Comments: In press: Special Issue of Journal of Statistical Physics: Statistical Mechanics and Social Science

Journal ref: Journal of Statistical Physics May 2013, Volume 151, Issue 3-4, pp 414-439

arXiv:1207.4914 [pdf, other]

doi 10.1103/PhysRevLett.110.088701

Opinions, Conflicts and Consensus: Modeling Social Dynamics in a Collaborative Environment

Authors: János Török, Gerardo Iñiguez, Taha Yasseri, Maxi San Miguel, Kimmo Kaski, János Kertész

Abstract: Information-communication technology promotes collaborative environments like Wikipedia where, however, controversiality and conflicts can appear. To describe the rise, persistence, and resolution of such conflicts we devise an extended opinion dynamics model where agents with different opinions perform a single task to make a consensual product. As a function of the convergence parameter describi… ▽ More Information-communication technology promotes collaborative environments like Wikipedia where, however, controversiality and conflicts can appear. To describe the rise, persistence, and resolution of such conflicts we devise an extended opinion dynamics model where agents with different opinions perform a single task to make a consensual product. As a function of the convergence parameter describing the influence of the product on the agents, the model shows spontaneous symmetry breaking of the final consensus opinion represented by the medium. In the case when agents are replaced with new ones at a certain rate, a transition from mainly consensus to a perpetual conflict occurs, which is in qualitative agreement with the scenarios observed in Wikipedia. △ Less

Submitted 22 November, 2012; v1 submitted 20 July, 2012; originally announced July 2012.

Comments: 6 pages, 5 figures. Submitted for publication

Journal ref: Phys. Rev. Lett. 110 (8), 088701 (2013)

arXiv:1204.2765 [pdf, other]

doi 10.1371/journal.pone.0048386

A practical approach to language complexity: a Wikipedia case study

Authors: Taha Yasseri, András Kornai, János Kertész

Abstract: In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet… ▽ More In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, e.g. that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully develo** articles, concluding that controversy has the effect of reducing language complexity. △ Less

Submitted 18 August, 2012; v1 submitted 12 April, 2012; originally announced April 2012.

Comments: 2 new figures, 1 new section, and 2 new supporting texts

Journal ref: PLoS ONE 7(11): e48386 (2012)

arXiv:1202.3643 [pdf, other]

doi 10.1371/journal.pone.0038869

Dynamics of conflicts in Wikipedia

Authors: Taha Yasseri, Robert Sumi, András Rung, András Kornai, János Kertész

Abstract: In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory… ▽ More In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only. △ Less

Submitted 2 May, 2012; v1 submitted 16 February, 2012; originally announced February 2012.

Comments: Supporting information added

Journal ref: PLoS ONE 7(6): e38869 (2012)

arXiv:1109.1746 [pdf, ps, other]

doi 10.1371/journal.pone.0030091

Circadian patterns of Wikipedia editorial activity: A demographic analysis

Authors: Taha Yasseri, Róbert Sumi, János Kertész

Abstract: Wikipedia (WP) as a collaborative, dynamical system of humans is an appropriate subject of social studies. Each single action of the members of this society, i.e. editors, is well recorded and accessible. Using the cumulative data of 34 Wikipedias in different languages, we try to characterize and find the universalities and differences in temporal activity patterns of editors. Based on this data,… ▽ More Wikipedia (WP) as a collaborative, dynamical system of humans is an appropriate subject of social studies. Each single action of the members of this society, i.e. editors, is well recorded and accessible. Using the cumulative data of 34 Wikipedias in different languages, we try to characterize and find the universalities and differences in temporal activity patterns of editors. Based on this data, we estimate the geographical distribution of editors for each WP in the globe. Furthermore we also clarify the differences among different groups of WPs, which originate in the variance of cultural and social features of the communities of editors. △ Less

Submitted 28 November, 2011; v1 submitted 8 September, 2011; originally announced September 2011.

Journal ref: PLoS ONE 7(1): e30091 (2012)

Showing 1–50 of 51 results for author: Yasseri, T