-
The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics
Authors:
Anna Samoilenko,
Taha Yasseri
Abstract:
Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featur…
▽ More
Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featured in the world's largest online encyclopedia is correlated with higher academic notability (assessed through citation counts). We found no statistically significant correlation between Wikipedia articles metrics (length, number of edits, number of incoming links from other articles, etc.) and academic notability of the mentioned researchers. We also did not find any evidence that the scientists with better WP representation are necessarily more prominent in their fields. In addition, we inspected the Wikipedia coverage of notable scientists sampled from Thomson Reuters list of "highly cited researchers". In each of the examined fields, Wikipedia failed in covering notable scholars properly. Both findings imply that Wikipedia might be producing an inaccurate image of academics on the front end of science. By shedding light on how public perception of academic progress is formed, this study alerts that a subjective element might have been introduced into the hitherto structured system of academic evaluation.
△ Less
Submitted 10 December, 2013; v1 submitted 31 October, 2013;
originally announced October 2013.
-
Rapid rise and decay in petition signing
Authors:
Taha Yasseri,
Scott A. Hale,
Helen Margetts
Abstract:
Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a give…
▽ More
Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a given platform. This paper tracks the growth curves of all 20,000 petitions to the UK government petitions website (http://epetitions.direct.gov.uk) and 1,800 petitions to the US White House site (https://petitions.whitehouse.gov), analyzing the rate of growth and outreach mechanism. Previous research has suggested the importance of the first day to the ultimate success of a petition, but has not examined early growth within that day, made possible here through hourly resolution in the data. The analysis shows that the vast majority of petitions do not achieve any measure of success; over 99 percent fail to get the 10,000 signatures required for an official response and only 0.1 percent attain the 100,000 required for a parliamentary debate (0.7 percent in the US). We analyze the data through a multiplicative process model framework to explain the heterogeneous growth of signatures at the population level. We define and measure an average outreach factor for petitions and show that it decays very fast (reducing to 0.1 pervent after 10 hours in the UK and 30 hours in the US). After a day or two, a petition's fate is virtually set. The findings challenge conventional analyses of collective action from economics and political science, where the production function has been assumed to follow an S-shaped curve.
△ Less
Submitted 3 January, 2023; v1 submitted 1 August, 2013;
originally announced August 2013.
-
The most controversial topics in Wikipedia: A multilingual and geographical analysis
Authors:
Taha Yasseri,
Anselm Spoerri,
Mark Graham,
János Kertész
Abstract:
We present, visualize and analyse the similarities and differences between the controversial topics related to "edit wars" identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top…
▽ More
We present, visualize and analyse the similarities and differences between the controversial topics related to "edit wars" identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top 100 lists of most controversial articles in different languages and the content related to geographical locations will be presented. We discuss what the presented analysis and visualizations can tell us about the multicultural aspects of Wikipedia and practices of peer-production. Our results indicate that Wikipedia is more than just an encyclopaedia; it is also a window into convergent and divergent social-spatial priorities, interests and preferences.
△ Less
Submitted 8 July, 2013; v1 submitted 23 May, 2013;
originally announced May 2013.
-
Temporal Analysis of Activity Patterns of Editors in Collaborative Map** Project of OpenStreetMap
Authors:
Taha Yasseri,
Giovanni Quattrone,
Afra Mashhadi
Abstract:
In the recent years Wikis have become an attractive platform for social studies of the human behaviour. Containing millions records of edits across the globe, collaborative systems such as Wikipedia have allowed researchers to gain a better understanding of editors participation and their activity patterns. However, contributions made to Geo-wikis_wiki-based collaborative map** projects_ differ…
▽ More
In the recent years Wikis have become an attractive platform for social studies of the human behaviour. Containing millions records of edits across the globe, collaborative systems such as Wikipedia have allowed researchers to gain a better understanding of editors participation and their activity patterns. However, contributions made to Geo-wikis_wiki-based collaborative map** projects_ differ from systems such as Wikipedia in a fundamental way due to spatial dimension of the content that limits the contributors to a set of those who posses local knowledge about a specific area and therefore cross-platform studies and comparisons are required to build a comprehensive image of online open collaboration phenomena. In this work, we study the temporal behavioural pattern of OpenStreetMap editors, a successful example of geo-wiki, for two European capital cities. We categorise different type of temporal patterns and report on the historical trend within a period of 7 years of the project age. We also draw a comparison with the previously observed editing activity patterns of Wikipedia.
△ Less
Submitted 7 April, 2013;
originally announced April 2013.
-
Petition Growth and Success Rates on the UK No. 10 Downing Street Website
Authors:
Scott A. Hale,
Helen Margetts,
Taha Yasseri
Abstract:
Now that so much of collective action takes place online, web-generated data can further understanding of the mechanics of Internet-based mobilisation. This trace data offers social science researchers the potential for new forms of analysis, using real-time transactional data based on entire populations, rather than sample-based surveys of what people think they did or might do. This paper uses a…
▽ More
Now that so much of collective action takes place online, web-generated data can further understanding of the mechanics of Internet-based mobilisation. This trace data offers social science researchers the potential for new forms of analysis, using real-time transactional data based on entire populations, rather than sample-based surveys of what people think they did or might do. This paper uses a `big data' approach to track the growth of over 8,000 petitions to the UK Government on the No. 10 Downing Street website for two years, analysing the rate of growth per day and testing the hypothesis that the distribution of daily change will be leptokurtic (rather than normal) as previous research on agenda setting would suggest. This hypothesis is confirmed, suggesting that Internet-based mobilisation is characterized by tip** points (or punctuated equilibria) and explaining some of the volatility in online collective action. We find also that most successful petitions grow quickly and that the number of signatures a petition receives on its first day is a significant factor in explaining the overall number of signatures a petition receives during its lifetime. These findings have implications for the strategies of those initiating petitions and the design of web sites with the aim of maximising citizen engagement with policy issues.
△ Less
Submitted 2 April, 2013;
originally announced April 2013.
-
Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data
Authors:
Márton Mestyán,
Taha Yasseri,
János Kertész
Abstract:
Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and…
▽ More
Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.
△ Less
Submitted 26 June, 2013; v1 submitted 5 November, 2012;
originally announced November 2012.
-
Value production in a collaborative environment
Authors:
Taha Yasseri,
János Kertész
Abstract:
We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in langua…
▽ More
We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in languages spoken in several time zones. Using a recently introduced measure we showed that the editorial activities have intrinsic dependencies in the burstiness of events. A comparison of the English and Simple English WPs revealed important aspects of language complexity and showed how peer cooperation solved the task of enhancing readability. One of our focus issues was characterizing the conflicts or edit wars in WPs, which helped us to automatically filter out controversial pages. When studying the temporal evolution of the controversiality of such pages we identified typical patterns and classified conflicts accordingly. Our quantitative analysis provides the basis of modeling conflicts and their resolution in collaborative environments and contribute to the understanding of this issue, which becomes increasingly important with the development of information communication technology.
△ Less
Submitted 14 February, 2013; v1 submitted 25 August, 2012;
originally announced August 2012.
-
Opinions, Conflicts and Consensus: Modeling Social Dynamics in a Collaborative Environment
Authors:
János Török,
Gerardo Iñiguez,
Taha Yasseri,
Maxi San Miguel,
Kimmo Kaski,
János Kertész
Abstract:
Information-communication technology promotes collaborative environments like Wikipedia where, however, controversiality and conflicts can appear. To describe the rise, persistence, and resolution of such conflicts we devise an extended opinion dynamics model where agents with different opinions perform a single task to make a consensual product. As a function of the convergence parameter describi…
▽ More
Information-communication technology promotes collaborative environments like Wikipedia where, however, controversiality and conflicts can appear. To describe the rise, persistence, and resolution of such conflicts we devise an extended opinion dynamics model where agents with different opinions perform a single task to make a consensual product. As a function of the convergence parameter describing the influence of the product on the agents, the model shows spontaneous symmetry breaking of the final consensus opinion represented by the medium. In the case when agents are replaced with new ones at a certain rate, a transition from mainly consensus to a perpetual conflict occurs, which is in qualitative agreement with the scenarios observed in Wikipedia.
△ Less
Submitted 22 November, 2012; v1 submitted 20 July, 2012;
originally announced July 2012.
-
A practical approach to language complexity: a Wikipedia case study
Authors:
Taha Yasseri,
András Kornai,
János Kertész
Abstract:
In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet…
▽ More
In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, e.g. that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully develo** articles, concluding that controversy has the effect of reducing language complexity.
△ Less
Submitted 18 August, 2012; v1 submitted 12 April, 2012;
originally announced April 2012.
-
Dynamics of conflicts in Wikipedia
Authors:
Taha Yasseri,
Robert Sumi,
András Rung,
András Kornai,
János Kertész
Abstract:
In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory…
▽ More
In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only.
△ Less
Submitted 2 May, 2012; v1 submitted 16 February, 2012;
originally announced February 2012.
-
A Monte Carlo study of surface sputtering by dual and rotated ion beams
Authors:
Taha Yasseri,
Reiner Kree
Abstract:
Several, recently proposed methods of surface manufacturing based on ion beam sputtering, which involve dual beam setups, sequential application of ion beams from different directions, or sample rotation, are studied with the method of kinetic Monte Carlo simulation of ion beam erosion and surface diffusion. In this work, we only consider erosion dominated situations. The results are discussed by…
▽ More
Several, recently proposed methods of surface manufacturing based on ion beam sputtering, which involve dual beam setups, sequential application of ion beams from different directions, or sample rotation, are studied with the method of kinetic Monte Carlo simulation of ion beam erosion and surface diffusion. In this work, we only consider erosion dominated situations. The results are discussed by comparing them to a number of theoretical propositions and to experimental findings. Two ion-beams aligned opposite to each other produce stationary, symmetric ripples. Two ion beams crossing at right angle will produce square patterns only, if they are exactly balanced. In all other cases of crossed beams, ripple patterns are created, and their orientations are shown to be predictable from linear continuum theory. In sequential ion beam sputtering we find a very rapid destruction of structures created from the previous beam direction after a rotation step, which leads to a transient decrease of overall roughness. Superpositions of patterns from several rotation steps are difficult to obtain, as they exist only in very short time windows. In setups with a single beam directed towards a rotating sample, we find a non-monotonic dependence of roughness on rotation frequency, with a very pronounced minimum appearing at the frequency scale set by the relaxation of prestructures observed in sequential ion beam setups. Furthermore we find that the logarithm of the height of structures decreases proportional to the inverse frequency.
△ Less
Submitted 27 September, 2011;
originally announced September 2011.
-
Circadian patterns of Wikipedia editorial activity: A demographic analysis
Authors:
Taha Yasseri,
Róbert Sumi,
János Kertész
Abstract:
Wikipedia (WP) as a collaborative, dynamical system of humans is an appropriate subject of social studies. Each single action of the members of this society, i.e. editors, is well recorded and accessible. Using the cumulative data of 34 Wikipedias in different languages, we try to characterize and find the universalities and differences in temporal activity patterns of editors. Based on this data,…
▽ More
Wikipedia (WP) as a collaborative, dynamical system of humans is an appropriate subject of social studies. Each single action of the members of this society, i.e. editors, is well recorded and accessible. Using the cumulative data of 34 Wikipedias in different languages, we try to characterize and find the universalities and differences in temporal activity patterns of editors. Based on this data, we estimate the geographical distribution of editors for each WP in the globe. Furthermore we also clarify the differences among different groups of WPs, which originate in the variance of cultural and social features of the communities of editors.
△ Less
Submitted 28 November, 2011; v1 submitted 8 September, 2011;
originally announced September 2011.
-
Edit wars in Wikipedia
Authors:
Róbert Sumi,
Taha Yasseri,
András Rung,
András Kornai,
János Kertész
Abstract:
We present a new, efficient method for automatically detecting severe conflicts `edit wars' in Wikipedia and evaluate this method on six different language WPs. We discuss how the number of edits, reverts, the length of discussions, the burstiness of edits and reverts deviate in such pages from those following the general workflow, and argue that earlier work has significantly over-estimated the c…
▽ More
We present a new, efficient method for automatically detecting severe conflicts `edit wars' in Wikipedia and evaluate this method on six different language WPs. We discuss how the number of edits, reverts, the length of discussions, the burstiness of edits and reverts deviate in such pages from those following the general workflow, and argue that earlier work has significantly over-estimated the contentiousness of the Wikipedia editing process.
△ Less
Submitted 9 February, 2012; v1 submitted 19 July, 2011;
originally announced July 2011.