Search | arXiv e-print repository

LiveData -- A Worldwide Data Mesh for Stratified Data

Authors: Simone Bocca, Amarsanaa Ganbold, Tsolmon Zundui

Abstract: Data reuse is fundamental for reducing the data integration effort required to build data supporting new applications, especially in data scarcity contexts. However, data reuse requires to deal with data heterogeneity, which is always present in data coming from different sources. Such heterogeneity appears at different levels, like the language used by the data, the structure of the information i… ▽ More Data reuse is fundamental for reducing the data integration effort required to build data supporting new applications, especially in data scarcity contexts. However, data reuse requires to deal with data heterogeneity, which is always present in data coming from different sources. Such heterogeneity appears at different levels, like the language used by the data, the structure of the information it represents, and the data types and formats adopted by the datasets. Despite the valuable insights gained by reusing data across contexts, dealing with data heterogeneity is still a high price to pay. Additionally, data reuse is hampered by the lack of data distribution infrastructures supporting the production and distribution of quality and interoperable data. These issues affecting data reuse are amplified considering cross-country data reuse, where geographical and cultural differences are more pronounced. In this paper, we propose LiveData, a cross-country data distribution network handling high quality and diversity-aware data. LiveData is composed by different nodes having an architecture providing components for the generation and distribution of a new type of data, where heterogeneity is transformed into information diversity and considered as a feature, explicitly defined and used to satisfy the data users purposes. This paper presents the specification of the LiveData network, by defining the architecture and the type of data handled by its nodes. This specification is currently being used to implement a concrete use case for data reuse and integration between the University of Trento (Italy) and the National University of Mongolia. △ Less

Submitted 27 May, 2024; originally announced July 2024.

Comments: Accepted to MMT-2024 Mongolian conference and ICTfocus journal (https://ictfocus.org/)

arXiv:2309.10681 [pdf]

Social Interactions Mediated by the Internet and the Big- Five: a Cross-Country Analysis

Authors: Andrea Mercado, Alethia Hume, Ivanno Bison, Fausto Giunchiglia, Amarsanaa Ganbold, Luca Cernuzzi

Abstract: This study analyzes the possible relationship between personality traits, in terms of Big Five (extraversion, agreeableness, responsibility, emotional stability and openness to experience), and social interactions mediated by digital platforms in different socioeconomic and cultural contexts. We considered data from a questionnaire and the experience of using a chatbot, as a mean of requesting and… ▽ More This study analyzes the possible relationship between personality traits, in terms of Big Five (extraversion, agreeableness, responsibility, emotional stability and openness to experience), and social interactions mediated by digital platforms in different socioeconomic and cultural contexts. We considered data from a questionnaire and the experience of using a chatbot, as a mean of requesting and offering help, with students from 4 universities: University of Trento (Italy), the National University of Mongolia, the School of Economics of London (United Kingdom) and the Universidad Católica Nuestra Señora de la Asunción (Paraguay). The main findings confirm that personality traits may influence social interactions and active participation in groups. Therefore, they should be taken into account to enrich the recommendation of matching algorithms between people who ask for help and people who could respond not only on the basis of their knowledge and skills. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: 5 pages

MSC Class: 68U99 ACM Class: J.4

arXiv:2306.05884 [pdf]

Social interactions mediated by the Internet and the Big-Five: a cross-country analysis

Authors: Andrea Mercado, Alethia Hume, Ivano Bison, Fausto Giunchiglia, Amarsanaa Ganbold, Luca Cernuzzi

Abstract: This study analyzes the possible relationship between personality traits, in terms of Big Five (extraversion, agreeableness, responsibility, emotional stability and openness to experience), and social interactions mediated by digital platforms in different socioeconomic and cultural contexts. We considered data from a questionnaire and the experience of using a chatbot, as a mean of requesting and… ▽ More This study analyzes the possible relationship between personality traits, in terms of Big Five (extraversion, agreeableness, responsibility, emotional stability and openness to experience), and social interactions mediated by digital platforms in different socioeconomic and cultural contexts. We considered data from a questionnaire and the experience of using a chatbot, as a mean of requesting and offering help, with students from 4 universities: University of Trento (Italy), the National University of Mongolia, the School of Economics of London (United Kingdom) and the Universidad Católica Nuestra Señora de la Asunción (Paraguay). The main findings confirm that personality traits may influence social interactions and active participation in groups. Therefore, they should be taken into account to enrich the recommendation of matching algorithms between people who ask for help and people who could respond not only on the basis of their knowledge and skills. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: Diversity-aware Hybrid Human-Artificial Intelligence (DHHAI)2023 workshop, 5 pages

arXiv:2302.08591 [pdf, other]

doi 10.1145/3544548.3581190

Complex Daily Activities, Country-Level Diversity, and Smartphone Sensing: A Study in Denmark, Italy, Mongolia, Paraguay, and UK

Authors: Karim Assi, Lakmal Meegahapola, William Droz, Peter Kun, Amalia de Gotzen, Miriam Bidoglia, Sally Stares, George Gaskell, Altangerel Chagnaa, Amarsanaa Ganbold, Tsolmon Zundui, Carlo Caprini, Daniele Miorandi, Alethia Hume, Jose Luis Zarza, Luca Cernuzzi, Ivano Bison, Marcelo Dario Rodas Britez, Matteo Busso, Ronald Chenu-Abente, Fausto Giunchiglia, Daniel Gatica-Perez

Abstract: Smartphones enable understanding human behavior with activity recognition to support people's daily lives. Prior studies focused on using inertial sensors to detect simple activities (sitting, walking, running, etc.) and were mostly conducted in homogeneous populations within a country. However, people are more sedentary in the post-pandemic world with the prevalence of remote/hybrid work/study se… ▽ More Smartphones enable understanding human behavior with activity recognition to support people's daily lives. Prior studies focused on using inertial sensors to detect simple activities (sitting, walking, running, etc.) and were mostly conducted in homogeneous populations within a country. However, people are more sedentary in the post-pandemic world with the prevalence of remote/hybrid work/study settings, making detecting simple activities less meaningful for context-aware applications. Hence, the understanding of (i) how multimodal smartphone sensors and machine learning models could be used to detect complex daily activities that can better inform about people's daily lives and (ii) how models generalize to unseen countries, is limited. We analyzed in-the-wild smartphone data and over 216K self-reports from 637 college students in five countries (Italy, Mongolia, UK, Denmark, Paraguay). Then, we defined a 12-class complex daily activity recognition task and evaluated the performance with different approaches. We found that even though the generic multi-country approach provided an AUROC of 0.70, the country-specific approach performed better with AUROC scores in [0.79-0.89]. We believe that research along the lines of diversity awareness is fundamental for advancing human behavior understanding through smartphones and machine learning, for more real-world utility across countries. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: ACM CHI 2023

arXiv:2211.03009 [pdf, other]

doi 10.1145/3569483

Generalization and Personalization of Mobile Sensing-Based Mood Inference Models: An Analysis of College Students in Eight Countries

Authors: Lakmal Meegahapola, William Droz, Peter Kun, Amalia de Gotzen, Chaitanya Nutakki, Shyam Diwakar, Salvador Ruiz Correa, Donglei Song, Hao Xu, Miriam Bidoglia, George Gaskell, Altangerel Chagnaa, Amarsanaa Ganbold, Tsolmon Zundui, Carlo Caprini, Daniele Miorandi, Alethia Hume, Jose Luis Zarza, Luca Cernuzzi, Ivano Bison, Marcelo Rodas Britez, Matteo Busso, Ronald Chenu-Abente, Can Gunel, Fausto Giunchiglia , et al. (2 additional authors not shown)

Abstract: Mood inference with mobile sensing data has been studied in ubicomp literature over the last decade. This inference enables context-aware and personalized user experiences in general mobile apps and valuable feedback and interventions in mobile health apps. However, even though model generalization issues have been highlighted in many studies, the focus has always been on improving the accuracies… ▽ More Mood inference with mobile sensing data has been studied in ubicomp literature over the last decade. This inference enables context-aware and personalized user experiences in general mobile apps and valuable feedback and interventions in mobile health apps. However, even though model generalization issues have been highlighted in many studies, the focus has always been on improving the accuracies of models using different sensing modalities and machine learning techniques, with datasets collected in homogeneous populations. In contrast, less attention has been given to studying the performance of mood inference models to assess whether models generalize to new countries. In this study, we collected a mobile sensing dataset with 329K self-reports from 678 participants in eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, UK) to assess the effect of geographical diversity on mood inference models. We define and evaluate country-specific (trained and tested within a country), continent-specific (trained and tested within a continent), country-agnostic (tested on a country not seen on training data), and multi-country (trained and tested with multiple countries) approaches trained on sensor data for two mood inference tasks with population-level (non-personalized) and hybrid (partially personalized) models. We show that partially personalized country-specific models perform the best yielding area under the receiver operating characteristic curve (AUROC) scores of the range 0.78-0.98 for two-class (negative vs. positive valence) and 0.76-0.94 for three-class (negative vs. neutral vs. positive valence) inference. Overall, we uncover generalization issues of mood inference models to new countries and how the geographical similarity of countries might impact mood inference. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: ACM IMWUT 2022, To be presented at ACM Ubicomp 2023

arXiv:2206.07615 [pdf, other]

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

Authors: Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, Viktor Martinović, Kyle Gorman, Zdeněk Žabokrtský, Amarsanaa Ganbold, Šárka Dohnalová, Magda Ševčíková, Kateřina Pelegrinová, Fausto Giunchiglia, Ryan Cotterell, Ekaterina Vylomova

Abstract: The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissi… ▽ More The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: The 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

arXiv:2204.05049 [pdf, other]

Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

Authors: Temuulen Khishigsuren, Gábor Bella, Khuyagbaatar Batsuren, Abed Alhakim Freihat, Nandu Chandran Nair, Amarsanaa Ganbold, Hadi Khalilia, Yamini Chandrashekar, Fausto Giunchiglia

Abstract: This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the phenomenon of diversity through the notions of lexical gap and language-specific word and use a systematic method to infer gaps semi-automatically on a large scale. As a first result obtained for the domain of kinship termino… ▽ More This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the phenomenon of diversity through the notions of lexical gap and language-specific word and use a systematic method to infer gaps semi-automatically on a large scale. As a first result obtained for the domain of kinship terminology, known to be very diverse throughout the world, we publish a lexico-semantic resource consisting of 198 domain concepts, 1,911 words, and 37,370 gaps covering 699 languages. We see potential in the use of resources such as ours for the improvement of a variety of cross-lingual NLP tasks, which we demonstrate through a downstream application for the evaluation of machine translation systems. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: LREC 2022

Showing 1–7 of 7 results for author: Ganbold, A