-
How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs?
Authors:
Indira Sen,
Mattia Samory,
Fabian Floeck,
Claudia Wagner,
Isabelle Augenstein
Abstract:
As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, it is crucial to ensure that these models are robust. One way of improving model robustness is to generate counterfactually augmented data (CAD) for training models that can better learn to distinguish between core features and data artifacts. While models trained on this type of data ha…
▽ More
As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, it is crucial to ensure that these models are robust. One way of improving model robustness is to generate counterfactually augmented data (CAD) for training models that can better learn to distinguish between core features and data artifacts. While models trained on this type of data have shown promising out-of-domain generalizability, it is still unclear what the sources of such improvements are. We investigate the benefits of CAD for social NLP models by focusing on three social computing constructs -- sentiment, sexism, and hate speech. Assessing the performance of models trained with and without CAD across different types of datasets, we find that while models trained on CAD show lower in-domain performance, they generalize better out-of-domain. We unpack this apparent discrepancy using machine explanations and find that CAD reduces model reliance on spurious features. Leveraging a novel typology of CAD to analyze their relationship with model performance, we find that CAD which acts on the construct directly or a diverse set of CAD leads to higher performance.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
'I Updated the <ref>': The Evolution of References in the English Wikipedia and the Implications for Altmetrics
Authors:
Olga Zagovora,
Roberto Ulloa,
Katrin Weller,
Fabian Flöck
Abstract:
With this work, we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. The high accuracy of…
▽ More
With this work, we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. The high accuracy of this method and the resulting dataset was confirmed via a comprehensive crowdworker labelling campaign. We use the dataset to study the temporal evolution of Wikipedia references as well as users' editing behaviour. We find evidence of a mostly productive and continuous effort to improve the quality of references: (1) there is a persistent increase of reference and document identifiers (DOI, PubMedID, PMC, ISBN, ISSN, ArXiv ID), and (2) most of the reference curation work is done by registered humans (not bots or anonymous editors). We conclude that the evolution of Wikipedia references, including the dynamics of the community processes that tend to them should be leveraged in the design of relevance indexes for altmetrics, and our dataset can be pivotal for such effort.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
"Call me sexist, but...": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples
Authors:
Mattia Samory,
Indira Sen,
Julian Kohne,
Fabian Floeck,
Claudia Wagner
Abstract:
Research has focused on automated methods to effectively detect sexism online. Although overt sexism seems easy to spot, its subtle forms and manifold expressions are not. In this paper, we outline the different dimensions of sexism by grounding them in their implementation in psychological scales. From the scales, we derive a codebook for sexism in social media, which we use to annotate existing…
▽ More
Research has focused on automated methods to effectively detect sexism online. Although overt sexism seems easy to spot, its subtle forms and manifold expressions are not. In this paper, we outline the different dimensions of sexism by grounding them in their implementation in psychological scales. From the scales, we derive a codebook for sexism in social media, which we use to annotate existing and novel datasets, surfacing their limitations in breadth and validity with respect to the construct of sexism. Next, we leverage the annotated datasets to generate adversarial examples, and test the reliability of sexism detection methods. Results indicate that current machine learning models pick up on a very narrow set of linguistic markers of sexism and do not generalize well to out-of-domain examples. Yet, including diverse data and adversarial examples at training time results in models that generalize better and that are more robust to artifacts of data collection. By providing a scale-based codebook and insights regarding the shortcomings of the state-of-the-art, we hope to contribute to the development of better and broader models for sexism detection, including reflections on theory-driven approaches to data collection.
△ Less
Submitted 2 June, 2021; v1 submitted 27 April, 2020;
originally announced April 2020.
-
TED-On: A Total Error Framework for Digital Traces of Human Behavior on Online Platforms
Authors:
Indira Sen,
Fabian Floeck,
Katrin Weller,
Bernd Weiss,
Claudia Wagner
Abstract:
Peoples' activities and opinions recorded as digital traces online, especially on social media and other web-based platforms, offer increasingly informative pictures of the public. They promise to allow inferences about populations beyond the users of the platforms on which the traces are recorded, representing real potential for the Social Sciences and a complement to survey-based research. But t…
▽ More
Peoples' activities and opinions recorded as digital traces online, especially on social media and other web-based platforms, offer increasingly informative pictures of the public. They promise to allow inferences about populations beyond the users of the platforms on which the traces are recorded, representing real potential for the Social Sciences and a complement to survey-based research. But the use of digital traces brings its own complexities and new error sources to the research enterprise. Recently, researchers have begun to discuss the errors that can occur when digital traces are used to learn about humans and social phenomena. This article synthesizes this discussion and proposes a systematic way to categorize potential errors, inspired by the Total Survey Error (TSE) Framework developed for survey methodology. We introduce a conceptual framework to diagnose, understand, and document errors that may occur in studies based on such digital traces. While there are clear parallels to the well-known error sources in the TSE framework, the new "Total Error Framework for Digital Traces of Human Behavior on Online Platforms" (TED-On) identifies several types of error that are specific to the use of digital traces. By providing a standard vocabulary to describe these errors, the proposed framework is intended to advance communication and research concerning the use of digital traces in scientific social research.
△ Less
Submitted 3 June, 2021; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Demographic Inference and Representative Population Estimates from Multilingual Social Media Data
Authors:
Zijian Wang,
Scott A. Hale,
David Adelani,
Przemyslaw A. Grabowicz,
Timo Hartmann,
Fabian Flöck,
David Jurgens
Abstract:
Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almos…
▽ More
Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Characterizing the Global Crowd Workforce: A Cross-Country Comparison of Crowdworker Demographics
Authors:
Lisa Posch,
Arnim Bleier,
Fabian Flöck,
Clemens M. Lechner,
Katharina Kinder-Kurlanda,
Denis Helic,
Markus Strohmaier
Abstract:
Since its emergence roughly a decade ago, microtask crowdsourcing has been attracting a heterogeneous set of workers from all over the globe. This paper sets out to explore the characteristics of the international crowd workforce and offers a cross-national comparison of crowdworker populations from ten countries. We provide an analysis and comparison of demographic characteristics and shed light…
▽ More
Since its emergence roughly a decade ago, microtask crowdsourcing has been attracting a heterogeneous set of workers from all over the globe. This paper sets out to explore the characteristics of the international crowd workforce and offers a cross-national comparison of crowdworker populations from ten countries. We provide an analysis and comparison of demographic characteristics and shed light on the significance of microtask income for workers situated in different national contexts. With over 11,000 individual responses, this study is the first large-scale country-level analysis of the characteristics of workers on the platform Appen (formerly CrowdFlower and Figure Eight), one of the two platforms dominating the microtask market. We find large differences between the characteristics of the crowd workforces of different countries, both regarding demography and regarding the importance of microtask income for workers. Furthermore, we find that the composition of the workforce in the ten countries was largely stable across samples taken at different points in time.
△ Less
Submitted 3 November, 2022; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Query for Architecture, Click through Military: Comparing the Roles of Search and Navigation on Wikipedia
Authors:
Dimitar Dimitrov,
Florian Lemmerich,
Fabian Flöck,
Markus Strohmaier
Abstract:
As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and e…
▽ More
As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and employ two main metrics, namely (i) searchshare -- the relative amount of views an article received by search --, and (ii) resistance -- the ability of an article to relay traffic to other Wikipedia articles -- to characterize articles. We demonstrate how articles in distinct topical categories differ substantially in terms of these properties. For example, architecture-related articles are often accessed through search and are simultaneously a "dead end" for traffic, whereas historical articles about military events are mainly navigated. We further link traffic differences to varying network, content, and editing activity features. Lastly, we measure the impact of the article properties by modeling access behavior on articles with a gradient boosting approach. The results of this paper constitute a step towards understanding human information seeking behavior on the Web.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
A Cross-Country Comparison of Crowdworker Motivations
Authors:
Lisa Posch,
Arnim Bleier,
Fabian Flöck,
Markus Strohmaier
Abstract:
Crowd employment is a new form of short term employment that has been rapidly becoming a source of income for a vast number of people around the globe. It differs considerably from more traditional forms of work, yet similar ethical and optimization issues arise. One key to tackle such challenges is to understand what motivates the international crowd workforce. In this work, we study the motivati…
▽ More
Crowd employment is a new form of short term employment that has been rapidly becoming a source of income for a vast number of people around the globe. It differs considerably from more traditional forms of work, yet similar ethical and optimization issues arise. One key to tackle such challenges is to understand what motivates the international crowd workforce. In this work, we study the motivation of workers involved in one particularly prevalent type of crowd employment: micro-tasks. We report on the results of applying the Multidimensional Crowdworker Motivation Scale (MCMS) in ten countries, which unveil significant international differences.
△ Less
Submitted 8 November, 2017;
originally announced November 2017.
-
"(Weitergeleitet von Journalistin)": The Gendered Presentation of Professions on Wikipedia
Authors:
Olga Zagovora,
Fabian Flöck,
Claudia Wagner
Abstract:
Previous research has shown the existence of gender biases in the depiction of professions and occupations in search engine results. Such an unbalanced presentation might just as likely occur on Wikipedia, one of the most popular knowledge resources on the Web, since the encyclopedia has already been found to exhibit such tendencies in past studies. Under this premise, our work assesses gender bia…
▽ More
Previous research has shown the existence of gender biases in the depiction of professions and occupations in search engine results. Such an unbalanced presentation might just as likely occur on Wikipedia, one of the most popular knowledge resources on the Web, since the encyclopedia has already been found to exhibit such tendencies in past studies. Under this premise, our work assesses gender bias with respect to the content of German Wikipedia articles about professions and occupations along three dimensions: used male vs. female titles (and redirects), included images of persons, and names of professionals mentioned in the articles. We further use German labor market data to assess the potential misrepresentation of a gender for each specific profession. Our findings in fact provide evidence for systematic over-representation of men on all three dimensions. For instance, for professional fields dominated by females, the respective articles on average still feature almost two times more images of men; and in the mean, 83% of the mentioned names of professionals were male and only 17% female.
△ Less
Submitted 12 June, 2017;
originally announced June 2017.
-
TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia
Authors:
Fabian Flöck,
Kenan Erdogan,
Maribel Acosta
Abstract:
We present a dataset that contains every instance of all tokens (~ words) ever written in undeleted, non-redirect English Wikipedia articles until October 2016, in total 13,545,349,787 instances. Each token is annotated with (i) the article revision it was originally created in, and (ii) lists with all the revisions in which the token was ever deleted and (potentially) re-added and re-deleted from…
▽ More
We present a dataset that contains every instance of all tokens (~ words) ever written in undeleted, non-redirect English Wikipedia articles until October 2016, in total 13,545,349,787 instances. Each token is annotated with (i) the article revision it was originally created in, and (ii) lists with all the revisions in which the token was ever deleted and (potentially) re-added and re-deleted from its article, enabling a complete and straightforward tracking of its history. This data would be exceedingly hard to create by an average potential user as it is (i) very expensive to compute and as (ii) accurately tracking the history of each token in revisioned documents is a non-trivial task. Adapting a state-of-the-art algorithm, we have produced a dataset that allows for a range of analyses and metrics, already popular in research and going beyond, to be generated on complete-Wikipedia scale; ensuring quality and allowing researchers to forego expensive text-comparison computation, which so far has hindered scalable usage. We show how this data enables, on token-level, computation of provenance, measuring survival of content over time, very detailed conflict metrics, and fine-grained interactions of editors like partial reverts, re-additions and other metrics, in the process gaining several novel insights.
△ Less
Submitted 23 March, 2017;
originally announced March 2017.
-
Measuring Motivations of Crowdworkers: The Multidimensional Crowdworker Motivation Scale
Authors:
Lisa Posch,
Arnim Bleier,
Clemens Lechner,
Daniel Danner,
Fabian Flöck,
Markus Strohmaier
Abstract:
Crowd employment is a new form of short-term and flexible employment which has emerged during the past decade. In order to understand this new form of employment, it is crucial to illuminate the underlying motivations of the workforce involved in it. This paper introduces the Multidimensional Crowdworker Motivation Scale (MCMS), a scale for measuring the motivation of crowdworkers on micro-task pl…
▽ More
Crowd employment is a new form of short-term and flexible employment which has emerged during the past decade. In order to understand this new form of employment, it is crucial to illuminate the underlying motivations of the workforce involved in it. This paper introduces the Multidimensional Crowdworker Motivation Scale (MCMS), a scale for measuring the motivation of crowdworkers on micro-task platforms. The MCMS is theoretically grounded in self-determination theory and tailored specifically to the context of paid crowdsourced micro-labor. The scale measures the motivation of crowdworkers along six motivational dimensions, ranging from amotivation to intrinsic motivation. We validated the MCMS on data collected in ten countries and three income groups. Factor analyses demonstrated that the MCMS's six dimensions showed good model fit, validity, and reliability. Furthermore, our measurement invariance tests showed that motivations measured with the MCMS are comparable across countries and income groups, and we present a first cross-country comparison of crowdworker motivations. This work constitutes an important first step towards understanding the motivations of the international crowd workforce.
△ Less
Submitted 15 March, 2019; v1 submitted 6 February, 2017;
originally announced February 2017.
-
Wikiwhere: An interactive tool for studying the geographical provenance of Wikipedia references
Authors:
Martin Körner,
Tatiana Sennikova,
Florian Windhäuser,
Claudia Wagner,
Fabian Flöck
Abstract:
Wikipedia articles about the same topic in different language editions are built around different sources of information. For example, one can find very different news articles linked as references in the English Wikipedia article titled "Annexation of Crimea by the Russian Federation" than in its German counterpart (determined via Wikipedia's language links). Some of this difference can of course…
▽ More
Wikipedia articles about the same topic in different language editions are built around different sources of information. For example, one can find very different news articles linked as references in the English Wikipedia article titled "Annexation of Crimea by the Russian Federation" than in its German counterpart (determined via Wikipedia's language links). Some of this difference can of course be attributed to the different language proficiencies of readers and editors in separate language editions, yet, although including English-language news sources seems to be no issue in the German edition, English references that are listed do not overlap highly with the ones in the article's English version. Such patterns could be an indicator of bias towards certain national contexts when referencing facts and statements in Wikipedia. However, determining for each reference which national context it can be traced back to, and comparing the link distributions to each other is infeasible for casual readers or scientists with non-technical backgrounds. Wikiwhere answers the question where Web references stem from by analyzing and visualizing the geographic location of external reference links that are included in a given Wikipedia article. Instead of relying solely on the IP location of a given URL our machine learning models consider several features.
△ Less
Submitted 16 December, 2016; v1 submitted 3 December, 2016;
originally announced December 2016.
-
RDF-Hunter: Automatically Crowdsourcing the Execution of Queries Against RDF Data Sets
Authors:
Maribel Acosta,
Elena Simperl,
Fabian Flöck,
Maria-Esther Vidal,
Rudi Studer
Abstract:
In the last years, a large number of RDF data sets has become available on the Web. However, due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against this data. To overcome this limitation, we propose RDF-Hunter, a novel hybrid query processing approach that brings together machine and human computation to execute queries against RD…
▽ More
In the last years, a large number of RDF data sets has become available on the Web. However, due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against this data. To overcome this limitation, we propose RDF-Hunter, a novel hybrid query processing approach that brings together machine and human computation to execute queries against RDF data. We develop a novel quality model and query engine in order to enable RDF-Hunter to on the fly decide which parts of a query should be executed through conventional technology or crowd computing. To evaluate RDF-Hunter, we created a collection of 50 SPARQL queries against the DBpedia data set, executed them using our hybrid query engine, and analyzed the accuracy of the outcomes obtained from the crowd. The experiments clearly show that the overall approach is feasible and produces query results that reliably and significantly enhance completeness of automatic query processing responses.
△ Less
Submitted 10 March, 2015;
originally announced March 2015.
-
Mining cross-cultural relations from Wikipedia - A study of 31 European food cultures
Authors:
Paul Laufer,
Claudia Wagner,
Fabian Flöck,
Markus Strohmaier
Abstract:
For many people, Wikipedia represents one of the primary sources of knowledge about foreign cultures. Yet, different Wikipedia language editions offer different descriptions of cultural practices. Unveiling diverging representations of cultures provides an important insight, since they may foster the formation of cross-cultural stereotypes, misunderstandings and potentially even conflict. In this…
▽ More
For many people, Wikipedia represents one of the primary sources of knowledge about foreign cultures. Yet, different Wikipedia language editions offer different descriptions of cultural practices. Unveiling diverging representations of cultures provides an important insight, since they may foster the formation of cross-cultural stereotypes, misunderstandings and potentially even conflict. In this work, we explore to what extent the descriptions of cultural practices in various European language editions of Wikipedia differ on the example of culinary practices and propose an approach to mine cultural relations between different language communities trough their description of and interest in their own and other communities' food culture. We assess the validity of the extracted relations using 1) various external reference data sources (i.e., the European Social Survey, migration statistics), 2) crowdsourcing methods and 3) simulations.
△ Less
Submitted 12 July, 2015; v1 submitted 17 November, 2014;
originally announced November 2014.
-
Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?
Authors:
Philipp Singer,
Fabian Flöck,
Clemens Meinhart,
Elias Zeitfogel,
Markus Strohmaier
Abstract:
In the past few years, Reddit -- a community-driven platform for submitting, commenting and rating links and text posts -- has grown exponentially, from a small community of users into one of the largest online communities on the Web. To the best of our knowledge, this work represents the most comprehensive longitudinal study of Reddit's evolution to date, studying both (i) how user submissions ha…
▽ More
In the past few years, Reddit -- a community-driven platform for submitting, commenting and rating links and text posts -- has grown exponentially, from a small community of users into one of the largest online communities on the Web. To the best of our knowledge, this work represents the most comprehensive longitudinal study of Reddit's evolution to date, studying both (i) how user submissions have evolved over time and (ii) how the community's allocation of attention and its perception of submissions have changed over 5 years based on an analysis of almost 60 million submissions. Our work reveals an ever-increasing diversification of topics accompanied by a simultaneous concentration towards a few selected domains both in terms of posted submissions as well as perception and attention. By and large, our investigations suggest that Reddit has transformed itself from a dedicated gateway to the Web to an increasingly self-referential community that focuses on and reinforces its own user-generated image- and textual content over external sources.
△ Less
Submitted 23 June, 2014; v1 submitted 6 February, 2014;
originally announced February 2014.