Skip to main content

Showing 1–27 of 27 results for author: Sidorov, G

.
  1. arXiv:2406.16961  [pdf, other

    cs.LG cs.AI

    Anime Popularity Prediction Before Huge Investments: a Multimodal Approach Using Deep Learning

    Authors: Jesús Armenta-Segura, Grigori Sidorov

    Abstract: In the japanese anime industry, predicting whether an upcoming product will be popular is crucial. This paper presents a dataset and methods on predicting anime popularity using a multimodal textimage dataset constructed exclusively from freely available internet sources. The dataset was built following rigorous standards based on real-life investment experiences. A deep neural network architectur… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures, 11 tables

  2. A multitask learning framework for leveraging subjectivity of annotators to identify misogyny

    Authors: Jason Angel, Segun Taofeek Aroyehun, Grigori Sidorov, Alexander Gelbukh

    Abstract: Identifying misogyny using artificial intelligence is a form of combating online toxicity against women. However, the subjective nature of interpreting misogyny poses a significant challenge to model the phenomenon. In this paper, we propose a multitask learning approach that leverages the subjectivity of this task to enhance the performance of the misogyny identification systems. We incorporated… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2405.03084  [pdf

    cs.CL cs.LG

    Analyzing Emotional Trends from X platform using SenticNet: A Comparative Analysis with Cryptocurrency Price

    Authors: Moein Shahiki Tash, Zahra Ahani, Olga Kolesnikova, Grigori Sidorov

    Abstract: This study delves into the relationship between emotional trends from X platform data and the market dynamics of well-known cryptocurrencies Cardano, Binance, Fantom, Matic, and Ripple over the period from October 2022 to March 2023. Leveraging SenticNet, we identified emotions like Fear and Anxiety, Rage and Anger, Grief and Sadness, Delight and Pleasantness, Enthusiasm and Eagerness, and Delight… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  4. arXiv:2401.16541  [pdf, other

    cs.CL cs.AI

    GuReT: Distinguishing Guilt and Regret related Text

    Authors: Sabur Butt, Fazlourrahman Balouchzahi, Abdul Gafar Manuel Meque, Maaz Amjad, Hector G. Ceballos Cancino, Grigori Sidorov, Alexander Gelbukh

    Abstract: The intricate relationship between human decision-making and emotions, particularly guilt and regret, has significant implications on behavior and well-being. Yet, these emotions subtle distinctions and interplay are often overlooked in computational models. This paper introduces a dataset tailored to dissect the relationship between guilt and regret and their unique textual markers, filling a not… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  5. arXiv:2401.07414  [pdf, other

    cs.CL

    Leveraging the power of transformers for guilt detection in text

    Authors: Abdul Gafar Manuel Meque, Jason Angel, Grigori Sidorov, Alexander Gelbukh

    Abstract: In recent years, language models and deep learning techniques have revolutionized natural language processing tasks, including emotion detection. However, the specific emotion of guilt has received limited attention in this field. In this research, we explore the applicability of three transformer-based language models for detecting guilt in text and compare their performance for general emotion d… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  6. arXiv:2311.04189  [pdf

    cs.CL

    SpaDeLeF: A Dataset for Hierarchical Classification of Lexical Functions for Collocations in Spanish

    Authors: Yevhen Kostiuk, Grigori Sidorov, Olga Kolesnikova

    Abstract: In natural language processing (NLP), lexical function is a concept to unambiguously represent semantic and syntactic features of words and phrases in text first crafted in the Meaning-Text Theory. Hierarchical classification of lexical functions involves organizing these features into a tree-like hierarchy of categories or labels. This is a challenging task as it requires a good understanding of… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  7. arXiv:2306.01261  [pdf, other

    cs.CL

    Automatic Translation of Hate Speech to Non-hate Speech in Social Media Texts

    Authors: Yevhen Kostiuk, Atnafu Lambebo Tonja, Grigori Sidorov, Olga Kolesnikova

    Abstract: In this paper, we investigate the issue of hate speech by presenting a novel task of translating hate speech into non-hate speech text while preserving its meaning. As a case study, we use Spanish texts. We provide a dataset and several baselines as a starting point for further research in the task. We evaluated our baseline results using multiple metrics, including BLEU scores. The aim of this st… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  8. arXiv:2305.17406  [pdf, other

    cs.CL

    Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models

    Authors: Atnafu Lambebo Tonja, Hellina Hailu Nigatu, Olga Kolesnikova, Grigori Sidorov, Alexander Gelbukh, Jugal Kalita

    Abstract: This paper describes CIC NLP's submission to the AmericasNLP 2023 Shared Task on machine translation systems for indigenous languages of the Americas. We present the system descriptions for three methods. We used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) -- Helsinki NLP Spanish-English translation model, and experimented with different transfer learning se… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to Third Workshop on NLP for Indigenous Languages of the Americas

  9. arXiv:2305.17404  [pdf, other

    cs.CL

    Parallel Corpus for Indigenous Language Translation: Spanish-Mazatec and Spanish-Mixtec

    Authors: Atnafu Lambebo Tonja, Christian Maldonado-Sifuentes, David Alejandro Mendoza Castillo, Olga Kolesnikova, Noé Castro-Sánchez, Grigori Sidorov, Alexander Gelbukh

    Abstract: In this paper, we present a parallel Spanish-Mazatec and Spanish-Mixtec corpus for machine translation (MT) tasks, where Mazatec and Mixtec are two indigenous Mexican languages. We evaluated the usability of the collected corpus using three different approaches: transformer, transfer learning, and fine-tuning pre-trained multilingual MT models. Fine-tuning the Facebook M2M100-48 model outperformed… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to Third Workshop on NLP for Indigenous Languages of the Americas

  10. arXiv:2303.07292   

    cs.CL

    Transformer-based approaches to Sentiment Detection

    Authors: Olumide Ebenezer Ojo, Hoang Thang Ta, Alexander Gelbukh, Hiram Calvo, Olaronke Oluwayemisi Adebanji, Grigori Sidorov

    Abstract: The use of transfer learning methods is largely responsible for the present breakthrough in Natural Learning Processing (NLP) tasks across multiple domains. In order to solve the problem of sentiment detection, we examined the performance of four different types of well-known state-of-the-art transformer models for text classification. Models such as Bidirectional Encoder Representations from Tran… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: This submission has been removed from arXiv because the submitter did not have the authority to grant the license at the time of submission

  11. arXiv:2303.03510  [pdf, other

    cs.CL

    Guilt Detection in Text: A Step Towards Understanding Complex Emotions

    Authors: Abdul Gafar Manuel Meque, Nisar Hussain, Grigori Sidorov, Alexander Gelbukh

    Abstract: We introduce a novel Natural Language Processing (NLP) task called Guilt detection, which focuses on detecting guilt in text. We identify guilt as a complex and vital emotion that has not been previously studied in NLP, and we aim to provide a more fine-grained analysis of it. To address the lack of publicly available corpora for guilt detection, we created VIC, a dataset containing 4622 texts fro… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  12. arXiv:2212.07549  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    ReDDIT: Regret Detection and Domain Identification from Text

    Authors: Fazlourrahman Balouchzahi, Sabur Butt, Grigori Sidorov, Alexander Gelbukh

    Abstract: In this paper, we present a study of regret and its expression on social media platforms. Specifically, we present a novel dataset of Reddit texts that have been classified into three classes: Regret by Action, Regret by Inaction, and No Regret. We then use this dataset to investigate the language used to express regret on Reddit and to identify the domains of text that are most commonly associate… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

  13. arXiv:2211.14459  [pdf, other

    cs.CL cs.AI

    Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts

    Authors: Atnafu Lambebo Tonja, Mesay Gemeda Yigezu, Olga Kolesnikova, Moein Shahiki Tash, Grigori Sidorov, Alexander Gelbuk

    Abstract: Using code-mixed data in natural language processing (NLP) research currently gets a lot of attention. Language identification of social media code-mixed text has been an interesting problem of study in recent years due to the advancement and influences of social media in communication. This paper presents the Instituto Politécnico Nacional, Centro de Investigación en Computación (CIC) team's syst… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  14. arXiv:2211.13014  [pdf, other

    cs.CL cs.LG

    Sarcasm Detection Framework Using Context, Emotion and Sentiment Features

    Authors: Oxana Vitman, Yevhen Kostiuk, Grigori Sidorov, Alexander Gelbukh

    Abstract: Sarcasm detection is an essential task that can help identify the actual sentiment in user-generated data, such as discussion forums or tweets. Sarcasm is a sophisticated form of linguistic expression because its surface meaning usually contradicts its inner, deeper meaning. Such incongruity is the essential component of sarcasm, however, it makes sarcasm detection quite a challenging task. In thi… ▽ More

    Submitted 4 January, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

  15. arXiv:2211.09847  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts

    Authors: H. L. Shashirekha, F. Balouchzahi, M. D. Anusha, G. Sidorov

    Abstract: The task of automatically identifying a language used in a given text is called Language Identification (LI). India is a multilingual country and many Indians especially youths are comfortable with Hindi and English, in addition to their local languages. Hence, they often use more than one language to post their comments on social media. Texts containing more than one language are called "code-mix… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  16. arXiv:2210.15224  [pdf, other

    cs.CL

    The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation

    Authors: Tadesse Destaw Belay, Atnafu Lambebo Tonja, Olga Kolesnikova, Seid Muhie Yimam, Abinew Ali Ayele, Silesh Bogale Haile, Grigori Sidorov, Alexander Gelbukh

    Abstract: Machine translation (MT) is one of the main tasks in natural language processing whose objective is to translate texts automatically from one natural language to another. Nowadays, using deep neural networks for MT tasks has received great attention. These networks require lots of data to learn abstract representations of the input and store it in continuous vectors. This paper presents the first… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  17. arXiv:2210.14136  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    PolyHope: Two-Level Hope Speech Detection from Tweets

    Authors: Fazlourrahman Balouchzahi, Grigori Sidorov, Alexander Gelbukh

    Abstract: Hope is characterized as openness of spirit toward the future, a desire, expectation, and wish for something to happen or to be true that remarkably affects human's state of mind, emotions, behaviors, and decisions. Hope is usually associated with concepts of desired expectations and possibility/probability concerning the future. Despite its importance, hope has rarely been studied as a social med… ▽ More

    Submitted 3 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: 20 pages, 9 figures

  18. arXiv:2210.12659  [pdf

    cs.CL cs.AI

    Map** Process for the Task: Wikidata Statements to Text as Wikipedia Sentences

    Authors: Hoang Thang Ta, Alexander Gelbukha, Grigori Sidorov

    Abstract: Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including develo** content for over 300 languages at the present. Therefore, the benefit that mach… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: 29 pages

  19. arXiv:2207.12406  [pdf, ps, other

    cs.CL

    UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu

    Authors: Maaz Amjad, Grigori Sidorov, Alisa Zhila, Alexander Gelbukh, Paolo Rosso

    Abstract: This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language. This is a binary classification task in which the goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing. The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2207.11893

  20. arXiv:2207.11893  [pdf, other

    cs.CL

    Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020

    Authors: Maaz Amjad, Grigori Sidorov, Alisa Zhila, Alexander Gelbukh, Paolo Rosso

    Abstract: This overview paper describes the first shared task on fake news detection in Urdu language. The task was posed as a binary classification task, in which the goal is to differentiate between real and fake news. We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing. The dataset contained news in five domains: (i) Health, (ii) Sports, (iii) Sho… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

  21. arXiv:2207.06710  [pdf, other

    cs.CL

    Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

    Authors: Maaz Amjad, Alisa Zhila, Grigori Sidorov, Andrey Labunets, Sabur Butta, Hamza Imam Amjad, Oxana Vitman, Alexander Gelbukh

    Abstract: With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existing studies and state-of-the-art methods focus on English as the target language, with limited work on low- and medium-resource languages. In this paper, we prese… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  22. arXiv:2207.06223  [pdf, other

    cs.IR cs.LG

    Job Offers Classifier using Neural Networks and Oversampling Methods

    Authors: Germán Ortiz, Gemma Bel Enguix, Helena Gómez-Adorno, Iqra Ameer, Grigori Sidorov

    Abstract: Both policy and research benefit from a better understanding of individuals' jobs. However, as large-scale administrative records are increasingly employed to represent labor market activity, new automatic methods to classify jobs will become necessary. We developed an automatic job offers classifier using a dataset collected from the largest job bank of Mexico known as Bumeran https://www.bumeran… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: 13 pages, 2 figures, 8th World Conference On Soft Computing

  23. arXiv:2207.05144  [pdf, ps, other

    cs.CL

    UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu

    Authors: Maaz Amjad, Sabur Butt, Hamza Imam Amjad, Grigori Sidorov, Alisa Zhila, Alexander Gelbukh

    Abstract: This study reports the second shared task named as UrduFake@FIRE2021 on identifying fake news detection in Urdu language. This is a binary classification problem in which the task is to classify a given news article into two classes: (i) real news, or (ii) fake news. In this shared task, 34 teams from 7 different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE) registered to part… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  24. arXiv:2207.05133  [pdf, other

    cs.CL cs.AI

    Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021

    Authors: Maaz Amjad, Sabur Butt, Hamza Imam Amjad, Alisa Zhila, Grigori Sidorov, Alexander Gelbukh

    Abstract: Automatic detection of fake news is a highly important task in the contemporary world. This study reports the 2nd shared task called UrduFake@FIRE2021 on identifying fake news detection in Urdu. The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem, particularly for the Urdu language. The task is posed as a binary classification p… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  25. arXiv:2207.01012  [pdf, other

    cs.LG cs.CL cs.CY

    Mental Illness Classification on Social Media Texts using Deep Learning and Transfer Learning

    Authors: Iqra Ameer, Muhammad Arif, Grigori Sidorov, Helena Gòmez-Adorno, Alexander Gelbukh

    Abstract: Given the current social distance restrictions across the world, most individuals now use social media as their major medium of communication. Millions of people suffering from mental diseases have been isolated due to this, and they are unable to get help in person. They have become more reliant on online venues to express themselves and seek advice on dealing with their mental disorders. Accordi… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: 11 pages, 2 figures, 8th World Conference On Soft Computing

  26. arXiv:2112.03003  [pdf, other

    cs.CL cs.SI

    What goes on inside rumour and non-rumour tweets and their reactions: A Psycholinguistic Analyses

    Authors: Sabur Butt, Shakshi Sharma, Rajesh Sharma, Grigori Sidorov, Alexander Gelbukh

    Abstract: In recent years, the problem of rumours on online social media (OSM) has attracted lots of attention. Researchers have started investigating from two main directions. First is the descriptive analysis of rumours and secondly, proposing techniques to detect (or classify) rumours. In the descriptive line of works, where researchers have tried to analyse rumours using NLP approaches, there isnt much… ▽ More

    Submitted 9 November, 2021; originally announced December 2021.

    Comments: 10 pages

  27. arXiv:1710.06524  [pdf, ps, other

    cs.CL

    Unsupervised Sentence Representations as Word Information Series: Revisiting TF--IDF

    Authors: Ignacio Arroyo-Fernández, Carlos-Francisco Méndez-Cruz, Gerardo Sierra, Juan-Manuel Torres-Moreno, Grigori Sidorov

    Abstract: Sentence representation at the semantic level is a challenging task for Natural Language Processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is an open question due to complexities of semantic interactions among words. In this paper, we present an embedding method, which is aimed at learning unsupervised sen… ▽ More

    Submitted 19 October, 2017; v1 submitted 17 October, 2017; originally announced October 2017.