-
Syracuse Maps as Non-singular Power-Bounded Transformations and Their Inverse Maps
Authors:
Idris Assani,
Ethan Ebbighausen,
Anand Hande
Abstract:
We prove that the dynamical system $(\mathbb{N}, 2^{\mathbb{N}}, T, μ)$, where $μ$ is a finite measure equivalent to the counting measure, is power-bounded in $L^1(μ)$ if and only if there exists one cycle of the map $T$ and for any $x \in \mathbb{N}$, there exists $k \in \mathbb{N}$ such that $T^k(x)$ is in some cycle of the map $T$. This result has immediate implications for the Collatz Conjectu…
▽ More
We prove that the dynamical system $(\mathbb{N}, 2^{\mathbb{N}}, T, μ)$, where $μ$ is a finite measure equivalent to the counting measure, is power-bounded in $L^1(μ)$ if and only if there exists one cycle of the map $T$ and for any $x \in \mathbb{N}$, there exists $k \in \mathbb{N}$ such that $T^k(x)$ is in some cycle of the map $T$. This result has immediate implications for the Collatz Conjecture, and we use it to motivate the study of number theoretic properties of the inverse image $T^{-1}(x)$ for $x \in \mathbb{N}$, where $T$ denotes the Collatz map here. We study similar properties for the related Syracuse maps, comparing them to the Collatz map. We also analyze some structural properties of the inverse image in relation to asymptotic density of the set $\{x \in \mathbb{N} \mid \exists k \in \mathbb{N}: T^k(x) < x\}$.
△ Less
Submitted 31 January, 2023; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Hypers at ComMA@ICON: Modelling Aggressiveness, Gender Bias and Communal Bias Identification
Authors:
Sean Benhur,
Roshan Nayak,
Kanchana Sivanraju,
Adeep Hande,
Subalalitha Chinnaudayar Navaneethakrishnan,
Ruba Priyadharshini,
Bharathi Raja Chakravarthi
Abstract:
Due to the exponentially increasing reach of social media, it is essential to focus on its negative aspects as it can potentially divide society and incite people into violence. In this paper, we present our system description of work on the shared task ComMA@ICON, where we have to classify how aggressive the sentence is and if the sentence is gender-biased or communal biased. These three could be…
▽ More
Due to the exponentially increasing reach of social media, it is essential to focus on its negative aspects as it can potentially divide society and incite people into violence. In this paper, we present our system description of work on the shared task ComMA@ICON, where we have to classify how aggressive the sentence is and if the sentence is gender-biased or communal biased. These three could be the primary reasons to cause significant problems in society. As team Hypers we have proposed an approach that utilizes different pretrained models with Attention and mean pooling methods. We were able to get Rank 3 with 0.223 Instance F1 score on Bengali, Rank 2 with 0.322 Instance F1 score on Multi-lingual set, Rank 4 with 0.129 Instance F1 score on Meitei and Rank 5 with 0.336 Instance F1 score on Hindi. The source code and the pretrained models of this work can be found here.
△ Less
Submitted 13 January, 2022; v1 submitted 31 December, 2021;
originally announced December 2021.
-
Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text
Authors:
Bharathi Raja Chakravarthi,
Ruba Priyadharshini,
Sajeetha Thavareesan,
Dhivya Chinnappa,
Durairaj Thenmozhi,
Elizabeth Sherly,
John P. McCrae,
Adeep Hande,
Rahul Ponnusamy,
Shubhanker Banerjee,
Charangan Vasantharajan
Abstract:
We present the results of the Dravidian-CodeMix shared task held at FIRE 2021, a track on sentiment analysis for Dravidian Languages in Code-Mixed Text. We describe the task, its organization, and the submitted systems. This shared task is the continuation of last year's Dravidian-CodeMix shared task held at FIRE 2020. This year's tasks included code-mixing at the intra-token and inter-token level…
▽ More
We present the results of the Dravidian-CodeMix shared task held at FIRE 2021, a track on sentiment analysis for Dravidian Languages in Code-Mixed Text. We describe the task, its organization, and the submitted systems. This shared task is the continuation of last year's Dravidian-CodeMix shared task held at FIRE 2020. This year's tasks included code-mixing at the intra-token and inter-token levels. Additionally, apart from Tamil and Malayalam, Kannada was also introduced. We received 22 systems for Tamil-English, 15 systems for Malayalam-English, and 15 for Kannada-English. The top system for Tamil-English, Malayalam-English and Kannada-English scored weighted average F1-score of 0.711, 0.804, and 0.630, respectively. In summary, the quality and quantity of the submission show that there is great interest in Dravidian languages in code-mixed setting and state of the art in this domain still needs more improvement.
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling
Authors:
Adeep Hande,
Karthik Puranik,
Konthala Yasaswini,
Ruba Priyadharshini,
Sajeetha Thavareesan,
Anbukkarasi Sampath,
Kogilavani Shanmugavadivel,
Durairaj Thenmozhi,
Bharathi Raja Chakravarthi
Abstract:
Social media has effectively become the prime hub of communication and digital marketing. As these platforms enable the free manifestation of thoughts and facts in text, images and video, there is an extensive need to screen them to protect individuals and groups from offensive content targeted at them. Our work intends to classify codemixed social media comments/posts in the Dravidian languages o…
▽ More
Social media has effectively become the prime hub of communication and digital marketing. As these platforms enable the free manifestation of thoughts and facts in text, images and video, there is an extensive need to screen them to protect individuals and groups from offensive content targeted at them. Our work intends to classify codemixed social media comments/posts in the Dravidian languages of Tamil, Kannada, and Malayalam. We intend to improve offensive language identification by generating pseudo-labels on the dataset. A custom dataset is constructed by transliterating all the code-mixed texts into the respective Dravidian language, either Kannada, Malayalam, or Tamil and then generating pseudo-labels for the transliterated dataset. The two datasets are combined using the generated pseudo-labels to create a custom dataset called CMTRA. As Dravidian languages are under-resourced, our approach increases the amount of training data for the language models. We fine-tune several recent pretrained language models on the newly constructed dataset. We extract the pretrained language embeddings and pass them onto recurrent neural networks. We observe that fine-tuning ULMFiT on the custom dataset yields the best results on the code-mixed test sets of all three languages. Our approach yields the best results among the benchmarked models on Tamil-English, achieving a weighted F1-Score of 0.7934 while scoring competitive weighted F1-Scores of 0.9624 and 0.7306 on the code-mixed test sets of Malayalam-English and Kannada-English, respectively.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021
Authors:
Karthik Puranik,
Adeep Hande,
Ruba Priyadharshini,
Thenmozhi Durairaj,
Anbukkarasi Sampath,
Kingston Pal Thamburaj,
Bharathi Raja Chakravarthi
Abstract:
This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English->Marathi and English->Irish language pairs LoResMT 2021 shared task. The task focuses on getting exceptional translations for rather low-resourced languages like Irish and Marathi. We fine-tune IndicTrans, a pretrained multilingual NMT model for English->Marathi, using external parallel corpus as in…
▽ More
This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English->Marathi and English->Irish language pairs LoResMT 2021 shared task. The task focuses on getting exceptional translations for rather low-resourced languages like Irish and Marathi. We fine-tune IndicTrans, a pretrained multilingual NMT model for English->Marathi, using external parallel corpus as input for additional training. We have used a pretrained Helsinki-NLP Opus MT English->Irish model for the latter language pair. Our approaches yield relatively promising results on the BLEU metrics. Under the team name IIITT, our systems ranked 1, 1, and 2 in English->Marathi, Irish->English, and English->Irish, respectively.
△ Less
Submitted 31 August, 2021; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Hope Speech detection in under-resourced Kannada language
Authors:
Adeep Hande,
Ruba Priyadharshini,
Anbukkarasi Sampath,
Kingston Pal Thamburaj,
Prabakaran Chandran,
Bharathi Raja Chakravarthi
Abstract:
Numerous methods have been developed to monitor the spread of negativity in modern years by eliminating vulgar, offensive, and fierce comments from social media platforms. However, there are relatively lesser amounts of study that converges on embracing positivity, reinforcing supportive and reassuring content in online forums. Consequently, we propose creating an English-Kannada Hope speech datas…
▽ More
Numerous methods have been developed to monitor the spread of negativity in modern years by eliminating vulgar, offensive, and fierce comments from social media platforms. However, there are relatively lesser amounts of study that converges on embracing positivity, reinforcing supportive and reassuring content in online forums. Consequently, we propose creating an English-Kannada Hope speech dataset, KanHope and comparing several experiments to benchmark the dataset. The dataset consists of 6,176 user-generated comments in code mixed Kannada scraped from YouTube and manually annotated as bearing hope speech or Not-hope speech. In addition, we introduce DC-BERT4HOPE, a dual-channel model that uses the English translation of KanHope for additional training to promote hope speech detection. The approach achieves a weighted F1-score of 0.756, bettering other models. Henceforth, KanHope aims to instigate research in Kannada while broadly promoting researchers to take a pragmatic approach towards online content that encourages, positive, and supportive.
△ Less
Submitted 5 December, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Do Images really do the Talking? Analysing the significance of Images in Tamil Troll meme classification
Authors:
Siddhanth U Hegde,
Adeep Hande,
Ruba Priyadharshini,
Sajeetha Thavareesan,
Ratnasingam Sakuntharaj,
Sathiyaraj Thangasamy,
B Bharathi,
Bharathi Raja Chakravarthi
Abstract:
A meme is an part of media created to share an opinion or emotion across the internet. Due to its popularity, memes have become the new forms of communication on social media. However, due to its nature, they are being used in harmful ways such as trolling and cyberbullying progressively. Various data modelling methods create different possibilities in feature extraction and turning them into bene…
▽ More
A meme is an part of media created to share an opinion or emotion across the internet. Due to its popularity, memes have become the new forms of communication on social media. However, due to its nature, they are being used in harmful ways such as trolling and cyberbullying progressively. Various data modelling methods create different possibilities in feature extraction and turning them into beneficial information. The variety of modalities included in data plays a significant part in predicting the results. We try to explore the significance of visual features of images in classifying memes. Memes are a blend of both image and text, where the text is embedded into the image. We try to incorporate the memes as troll and non-trolling memes based on the images and the text on them. However, the images are to be analysed and combined with the text to increase performance. Our work illustrates different textual analysis methods and contrasting multimodal methods ranging from simple merging to cross attention to utilising both worlds' - best visual and textual features. The fine-tuned cross-lingual language model, XLM, performed the best in textual analysis, and the multimodal transformer performs the best in multimodal analysis.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive Language Identification in Under-Resourced Dravidian Languages
Authors:
Adeep Hande,
Siddhanth U Hegde,
Ruba Priyadharshini,
Rahul Ponnusamy,
Prasanna Kumar Kumaresan,
Sajeetha Thavareesan,
Bharathi Raja Chakravarthi
Abstract:
To obtain extensive annotated data for under-resourced languages is challenging, so in this research, we have investigated whether it is beneficial to train models using multi-task learning. Sentiment analysis and offensive language identification share similar discourse properties. The selection of these tasks is motivated by the lack of large labelled data for user-generated code-mixed datasets.…
▽ More
To obtain extensive annotated data for under-resourced languages is challenging, so in this research, we have investigated whether it is beneficial to train models using multi-task learning. Sentiment analysis and offensive language identification share similar discourse properties. The selection of these tasks is motivated by the lack of large labelled data for user-generated code-mixed datasets. This paper works on code-mixed YouTube comments for Tamil, Malayalam, and Kannada languages. Our framework is applicable to other sequence classification problems irrespective of the size of the datasets. Experiments show that our multi-task learning model can achieve high results compared with single-task learning while reducing the time and space constraints required to train the models on individual tasks. Analysis of fine-tuned models indicates the preference of multi-task learning over single-task learning resulting in a higher weighted F1-score on all three languages. We apply two multi-task learning approaches to three Dravidian languages: Kannada, Malayalam, and Tamil. Maximum scores on Kannada and Malayalam were achieved by mBERT subjected to cross-entropy loss and with an approach of hard parameter sharing. Best scores on Tamil was achieved by DistilBERT subjected to cross-entropy loss with soft parameter sharing as the architecture type. For the tasks of sentiment analysis and offensive language identification, the best-performing model scored a weighted F1-score of (66.8\% and 90.5\%), (59\% and 70\%), and (62.1\% and 75.3\%) for Kannada, Malayalam, and Tamil on sentiment analysis and offensive language identification, respectively. The data and approaches discussed in this paper are published in Github\footnote{\href{https://github.com/SiddhanthHegde/Dravidian-MTL-Benchmarking}{Dravidian-MTL-Benchmarking}}.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
UVCE-IIITT@DravidianLangTech-EACL2021: Tamil Troll Meme Classification: You need to Pay more Attention
Authors:
Siddhanth U Hegde,
Adeep Hande,
Ruba Priyadharshini,
Sajeetha Thavareesan,
Bharathi Raja Chakravarthi
Abstract:
Tamil is a Dravidian language that is commonly used and spoken in the southern part of Asia. In the era of social media, memes have been a fun moment in the day-to-day life of people. Here, we try to analyze the true meaning of Tamil memes by categorizing them as troll and non-troll. We propose an ingenious model comprising of a transformer-transformer architecture that tries to attain state-of-th…
▽ More
Tamil is a Dravidian language that is commonly used and spoken in the southern part of Asia. In the era of social media, memes have been a fun moment in the day-to-day life of people. Here, we try to analyze the true meaning of Tamil memes by categorizing them as troll and non-troll. We propose an ingenious model comprising of a transformer-transformer architecture that tries to attain state-of-the-art by using attention as its main component. The dataset consists of troll and non-troll images with their captions as text. The task is a binary classification task. The objective of the model is to pay more attention to the extracted features and to ignore the noise in both images and text.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always Hope in Transformers
Authors:
Karthik Puranik,
Adeep Hande,
Ruba Priyadharshini,
Sajeetha Thavareesan,
Bharathi Raja Chakravarthi
Abstract:
In a world filled with serious challenges like climate change, religious and political conflicts, global pandemics, terrorism, and racial discrimination, an internet full of hate speech, abusive and offensive content is the last thing we desire for. In this paper, we work to identify and promote positive and supportive content on these platforms. We work with several transformer-based models to cl…
▽ More
In a world filled with serious challenges like climate change, religious and political conflicts, global pandemics, terrorism, and racial discrimination, an internet full of hate speech, abusive and offensive content is the last thing we desire for. In this paper, we work to identify and promote positive and supportive content on these platforms. We work with several transformer-based models to classify social media comments as hope speech or not-hope speech in English, Malayalam and Tamil languages. This paper portrays our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2021- EACL 2021.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.