Unraveling Code-Mixing Patterns in Migration Discourse: Automated Detection and Analysis of Online Conversations on Reddit
Authors:
Fedor Vitiugin,
Sunok Lee,
Henna Paakki,
Anastasiia Chizhikova,
Nitin Sawhney
Abstract:
The surge in global migration patterns underscores the imperative of integrating migrants seamlessly into host communities, necessitating inclusive and trustworthy public services. Despite the Nordic countries' robust public sector infrastructure, recent immigrants often encounter barriers to accessing these services, exacerbating social disparities and eroding trust. Addressing digital inequaliti…
▽ More
The surge in global migration patterns underscores the imperative of integrating migrants seamlessly into host communities, necessitating inclusive and trustworthy public services. Despite the Nordic countries' robust public sector infrastructure, recent immigrants often encounter barriers to accessing these services, exacerbating social disparities and eroding trust. Addressing digital inequalities and linguistic diversity is paramount in this endeavor. This paper explores the utilization of code-mixing, a communication strategy prevalent among multilingual speakers, in migration-related discourse on social media platforms such as Reddit. We present Ensemble Learning for Multilingual Identification of Code-mixed Texts (ELMICT), a novel approach designed to automatically detect code-mixed messages in migration-related discussions. Leveraging ensemble learning techniques for combining multiple tokenizers' outputs and pre-trained language models, ELMICT demonstrates high performance (with F1 more than 0.95) in identifying code-mixing across various languages and contexts, particularly in cross-lingual zero-shot conditions (with avg. F1 more than 0.70). Moreover, the utilization of ELMICT helps to analyze the prevalence of code-mixing in migration-related threads compared to other thematic categories on Reddit, shedding light on the topics of concern to migrant communities. Our findings reveal insights into the communicative strategies employed by migrants on social media platforms, offering implications for the development of inclusive digital public services and conversational systems. By addressing the research questions posed in this study, we contribute to the understanding of linguistic diversity in migration discourse and pave the way for more effective tools for building trust in multicultural societies.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
Construction of Hyper-Relational Knowledge Graphs Using Pre-Trained Large Language Models
Authors:
Preetha Datta,
Fedor Vitiugin,
Anastasiia Chizhikova,
Nitin Sawhney
Abstract:
Extracting hyper-relations is crucial for constructing comprehensive knowledge graphs, but there are limited supervised methods available for this task. To address this gap, we introduce a zero-shot prompt-based method using OpenAI's GPT-3.5 model for extracting hyper-relational knowledge from text. Comparing our model with a baseline, we achieved promising results, with a recall of 0.77. Although…
▽ More
Extracting hyper-relations is crucial for constructing comprehensive knowledge graphs, but there are limited supervised methods available for this task. To address this gap, we introduce a zero-shot prompt-based method using OpenAI's GPT-3.5 model for extracting hyper-relational knowledge from text. Comparing our model with a baseline, we achieved promising results, with a recall of 0.77. Although our precision is currently lower, a detailed analysis of the model outputs has uncovered potential pathways for future research in this area.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.