-
Dataset of Multi-aspect Integrated Migration Indicators
Authors:
D. Goglia,
L. Pollacci,
A. Sirbu
Abstract:
Nowadays, new branches of research are proposing the use of non-traditional data sources for the study of migration trends in order to find an original methodology to answer open questions about the human mobility framework. In this context we presents the Multi-aspect Integrated Migration Indicators (MIMI) dataset, an new dataset of migration drivers, resulting from the process of acquisition, tr…
▽ More
Nowadays, new branches of research are proposing the use of non-traditional data sources for the study of migration trends in order to find an original methodology to answer open questions about the human mobility framework. In this context we presents the Multi-aspect Integrated Migration Indicators (MIMI) dataset, an new dataset of migration drivers, resulting from the process of acquisition, transformation and merge of both official data about international flows and stocks and original indicators not typically used in migration studies, such as online social networks. This work describes the process of gathering, embedding and merging traditional and novel features, resulting in this new multidisciplinary dataset that we believe could significantly contribute to nowcast to forecast both present and future bilateral migration trends.
△ Less
Submitted 2 May, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Measuring the Salad Bowl: Superdiversity on Twitter
Authors:
Laura Pollacci,
Alina Sirbu,
Fosca Giannotti,
Dino Pedreschi
Abstract:
Superdiversity refers to large cultural diversity in a population due to immigration. In this paper, we introduce a superdiversity index based on the changes in the emotional content of words used by a multi-cultural community, compared to the standard language. To compute our index we use Twitter data and we develop an algorithm to extend a dictionary for lexicon-based sentiment analysis. We vali…
▽ More
Superdiversity refers to large cultural diversity in a population due to immigration. In this paper, we introduce a superdiversity index based on the changes in the emotional content of words used by a multi-cultural community, compared to the standard language. To compute our index we use Twitter data and we develop an algorithm to extend a dictionary for lexicon-based sentiment analysis. We validate our index by comparing it with official immigration statistics available from the European Commission's Joint Research Center, through the D4I data challenge. We show that, in general, our measure correlates with immigration rates, at various geographical resolutions. Our method produces very good results across languages, being tested here both on English and Italian tweets. We argue that our index has predictive power in regions where exact data on immigration is not available, paving the way for a nowcasting model of immigration rates.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
EMAKG: An Enhanced Version Of The Microsoft Academic Knowledge Graph
Authors:
Laura Pollacci
Abstract:
Scholarly knowledge graphs are valuable sources of information in several research fields. Despite the number of existing datasets related to publications and researchers, resource quality, coverage and accessibility are still limited. This article presents the Enhanced Microsoft Academic Knowledge Graph, a large dataset of information about scientific publications and involved entities, and the m…
▽ More
Scholarly knowledge graphs are valuable sources of information in several research fields. Despite the number of existing datasets related to publications and researchers, resource quality, coverage and accessibility are still limited. This article presents the Enhanced Microsoft Academic Knowledge Graph, a large dataset of information about scientific publications and involved entities, and the methods developed to build it. Data includes geographical information, researchers' collaborative networks and movements between institutions, academic-related metrics, and linguistic features. The dataset merges information from several data sources and has high temporal and spatial 7 coverage, allowing several use cases.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.