Tracking Knowledge Propagation Across Wikipedia Languages

Valentim, Roldolfo; Comarela, Giovanni; Park, Souneil; Saez-Trumper, Diego

Computer Science > Computers and Society

arXiv:2103.16613 (cs)

[Submitted on 30 Mar 2021]

Title:Tracking Knowledge Propagation Across Wikipedia Languages

Authors:Roldolfo Valentim, Giovanni Comarela, Souneil Park, Diego Saez-Trumper

View PDF

Abstract:In this paper, we present a dataset of inter-language knowledge propagation in Wikipedia. Covering the entire 309 language editions and 33M articles, the dataset aims to track the full propagation history of Wikipedia concepts, and allow follow up research on building predictive models of them. For this purpose, we align all the Wikipedia articles in a language-agnostic manner according to the concept they cover, which results in 13M propagation instances. To the best of our knowledge, this dataset is the first to explore the full inter-language propagation at a large scale. Together with the dataset, a holistic overview of the propagation and key insights about the underlying structural factors are provided to aid future research. For example, we find that although long cascades are unusual, the propagation tends to continue further once it reaches more than four language editions. We also find that the size of language editions is associated with the speed of propagation. We believe the dataset not only contributes to the prior literature on Wikipedia growth but also enables new use cases such as edit recommendation for addressing knowledge gaps, detection of disinformation, and cultural relationship analysis.

Subjects:	Computers and Society (cs.CY)
Cite as:	arXiv:2103.16613 [cs.CY]
	(or arXiv:2103.16613v1 [cs.CY] for this version)
	https://doi.org/10.48550/arXiv.2103.16613
Journal reference:	15th International Conference on Web and Social Media (ICWSM-21), 2021

Submission history

From: Diego Saez-Trumper [view email]
[v1] Tue, 30 Mar 2021 18:36:13 UTC (8,077 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CY

< prev | next >

new | recent | 2021-03

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Rodolfo V. Valentim
Souneil Park
Diego Sáez-Trumper

export BibTeX citation

Computer Science > Computers and Society

Title:Tracking Knowledge Propagation Across Wikipedia Languages

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computers and Society

Title:Tracking Knowledge Propagation Across Wikipedia Languages

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators