Search | arXiv e-print repository

doi 10.3233/JIFS-219355

A multitask learning framework for leveraging subjectivity of annotators to identify misogyny

Authors: Jason Angel, Segun Taofeek Aroyehun, Grigori Sidorov, Alexander Gelbukh

Abstract: Identifying misogyny using artificial intelligence is a form of combating online toxicity against women. However, the subjective nature of interpreting misogyny poses a significant challenge to model the phenomenon. In this paper, we propose a multitask learning approach that leverages the subjectivity of this task to enhance the performance of the misogyny identification systems. We incorporated… ▽ More Identifying misogyny using artificial intelligence is a form of combating online toxicity against women. However, the subjective nature of interpreting misogyny poses a significant challenge to model the phenomenon. In this paper, we propose a multitask learning approach that leverages the subjectivity of this task to enhance the performance of the misogyny identification systems. We incorporated diverse perspectives from annotators in our model design, considering gender and age across six profile groups, and conducted extensive experiments and error analysis using two language models to validate our four alternative designs of the multitask learning technique to identify misogynistic content in English tweets. The results demonstrate that incorporating various viewpoints enhances the language models' ability to interpret different forms of misogyny. This research advances content moderation and highlights the importance of embracing diverse perspectives to build effective online moderation systems. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2404.05556 [pdf, other]

doi 10.1016/j.compfluid.2024.106321

Bathymetry reconstruction from experimental data using PDE-constrained optimisation

Authors: Judith Angel, Jörn Behrens, Sebastian Götschel, Marten Hollm, Daniel Ruprecht, Robert Seifried

Abstract: Knowledge of the bottom topography, also called bathymetry, of rivers, seas or the ocean is important for many areas of maritime science and civil engineering. While direct measurements are possible, they are time consuming and expensive. Therefore, many approaches have been proposed how to infer the bathymetry from measurements of surface waves. Mathematically, this is an inverse problem where an… ▽ More Knowledge of the bottom topography, also called bathymetry, of rivers, seas or the ocean is important for many areas of maritime science and civil engineering. While direct measurements are possible, they are time consuming and expensive. Therefore, many approaches have been proposed how to infer the bathymetry from measurements of surface waves. Mathematically, this is an inverse problem where an unknown system state needs to be reconstructed from observations with a suitable model for the flow as constraint. In many cases, the shallow water equations can be used to describe the flow. While theoretical studies of the efficacy of such a PDE-constrained optimisation approach for bathymetry reconstruction exist, there seem to be few publications that study its application to data obtained from real-world measurements. This paper shows that the approach can, at least qualitatively, reconstruct a Gaussian-shaped bathymetry in a wave flume from measurements of the water height at up to three points. Achieved normalized root mean square errors (NRMSE) are in line with other approaches. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Journal ref: Computers & Fluids 278, pp. 106321, 2024

arXiv:2401.07470 [pdf]

Utilizing deep learning models for the identification of enhancers and super-enhancers based on genomic and epigenomic features

Authors: Zahra Ahani, Moein Shahiki Tash, Yoel Ledo Mezquita, Jason Angel

Abstract: This paper provides an extensive examination of a sizable dataset of English tweets focusing on nine widely recognized cryptocurrencies, specifically Cardano, Binance, Bitcoin, Dogecoin, Ethereum, Fantom, Matic, Shiba, and Ripple. Our primary objective was to conduct a psycholinguistic and emotion analysis of social media content associated with these cryptocurrencies. To enable investigators to m… ▽ More This paper provides an extensive examination of a sizable dataset of English tweets focusing on nine widely recognized cryptocurrencies, specifically Cardano, Binance, Bitcoin, Dogecoin, Ethereum, Fantom, Matic, Shiba, and Ripple. Our primary objective was to conduct a psycholinguistic and emotion analysis of social media content associated with these cryptocurrencies. To enable investigators to make more informed decisions. The study involved comparing linguistic characteristics across the diverse digital coins, shedding light on the distinctive linguistic patterns that emerge within each coin's community. To achieve this, we utilized advanced text analysis techniques. Additionally, our work unveiled an intriguing Understanding of the interplay between these digital assets within the cryptocurrency community. By examining which coin pairs are mentioned together most frequently in the dataset, we established correlations between different cryptocurrencies. To ensure the reliability of our findings, we initially gathered a total of 832,559 tweets from Twitter. These tweets underwent a rigorous preprocessing stage, resulting in a refined dataset of 115,899 tweets that were used for our analysis. Overall, our research offers valuable Perception into the linguistic nuances of various digital coins' online communities and provides a deeper understanding of their interactions in the cryptocurrency space. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: 13 pages, 7 figures, 6 Tables

arXiv:2401.07414 [pdf, other]

Leveraging the power of transformers for guilt detection in text

Authors: Abdul Gafar Manuel Meque, Jason Angel, Grigori Sidorov, Alexander Gelbukh

Abstract: In recent years, language models and deep learning techniques have revolutionized natural language processing tasks, including emotion detection. However, the specific emotion of guilt has received limited attention in this field. In this research, we explore the applicability of three transformer-based language models for detecting guilt in text and compare their performance for general emotion d… ▽ More In recent years, language models and deep learning techniques have revolutionized natural language processing tasks, including emotion detection. However, the specific emotion of guilt has received limited attention in this field. In this research, we explore the applicability of three transformer-based language models for detecting guilt in text and compare their performance for general emotion detection and guilt detection. Our proposed model outformed BERT and RoBERTa models by two and one points respectively. Additionally, we analyze the challenges in develo** accurate guilt-detection models and evaluate our model's effectiveness in detecting related emotions like "shame" through qualitative analysis of results. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2111.10228 [pdf, other]

Impact of spatial coarsening on Parareal convergence

Authors: Judith Angel, Sebastian Götschel, Daniel Ruprecht

Abstract: We study the impact of spatial coarsening on the convergence of the Parareal algorithm, both theoretically and numerically. For initial value problems with a normal system matrix, we prove a lower bound for the Euclidean norm of the iteration matrix. When there is no physical or numerical diffusion, an immediate consequence is that the norm of the iteration matrix cannot be smaller than unoty as s… ▽ More We study the impact of spatial coarsening on the convergence of the Parareal algorithm, both theoretically and numerically. For initial value problems with a normal system matrix, we prove a lower bound for the Euclidean norm of the iteration matrix. When there is no physical or numerical diffusion, an immediate consequence is that the norm of the iteration matrix cannot be smaller than unoty as soon as the coarse problem has fewer degrees-of-freedom than the fine. This prevents a theoretical guarantee for monotonic convergence, which is necessary to obtain meaningful speedups. For diffusive problems, in the worst-case where the iteration error contracts only as fast as the powers of the iteration matrix norm, making Parareal as accurate as the fine method will take about as many iterations as there are processors, making meaningful speedup impossible. Numerical examples with a non-normal system matrix show that for diffusive problems good speedup is possible, but that for non-diffusive problems the negative impact of spatial coarsening on convergence is big. △ Less

Submitted 19 November, 2021; originally announced November 2021.

arXiv:2102.05287 [pdf, other]

doi 10.1016/j.jcp.2021.110569

A positivity-preserving high-order weighted compact nonlinear scheme for compressible gas-liquid flows

Authors: Man Long Wong, Jordan B. Angel, Michael F. Barad, Cetin C. Kiris

Abstract: We present a robust, highly accurate, and efficient positivity- and boundedness-preserving diffuse interface method for the simulations of compressible gas-liquid two-phase flows with the five-equation model by Allaire et al. using high-order finite difference weighted compact nonlinear scheme (WCNS) in the explicit form. The equation of states of gas and liquid are given by the ideal gas and stif… ▽ More We present a robust, highly accurate, and efficient positivity- and boundedness-preserving diffuse interface method for the simulations of compressible gas-liquid two-phase flows with the five-equation model by Allaire et al. using high-order finite difference weighted compact nonlinear scheme (WCNS) in the explicit form. The equation of states of gas and liquid are given by the ideal gas and stiffened gas laws respectively. Under a mild assumption on the relative magnitude between the ratios of specific heats of the gas and liquid, we can construct limiting procedures for the fifth order incremental-stencil WCNS (WCNS-IS) with the first order Harten-Lax-van Leer contact (HLLC) flux such that positive partial densities and squared speed of sound can be ensured in the solutions, together with bounded volume fractions and mass fractions. The limiting procedures are discretely conservative for all conservative equations in the five-equation model and can also be easily extended for any other conservative finite difference or finite volume scheme. Numerical tests with liquid water and air are reported to demonstrate the robustness and high accuracy of the WCNS-IS with the positivity- and boundedness-preserving limiters even under extreme conditions. △ Less

Submitted 10 February, 2021; originally announced February 2021.

arXiv:2011.03760 [pdf, other]

NLP-CIC @ PRELEARN: Mastering prerequisites relations, from handcrafted features to embeddings

Authors: Jason Angel, Segun Taofeek Aroyehun, Alexander Gelbukh

Abstract: We present our systems and findings for the prerequisite relation learning task (PRELEARN) at EVALITA 2020. The task aims to classify whether a pair of concepts hold a prerequisite relation or not. We model the problem using handcrafted features and embedding representations for in-domain and cross-domain scenarios. Our submissions ranked first place in both scenarios with average F1 score of 0.88… ▽ More We present our systems and findings for the prerequisite relation learning task (PRELEARN) at EVALITA 2020. The task aims to classify whether a pair of concepts hold a prerequisite relation or not. We model the problem using handcrafted features and embedding representations for in-domain and cross-domain scenarios. Our submissions ranked first place in both scenarios with average F1 score of 0.887 and 0.690 respectively across domains on the test sets. We made our code is freely available. △ Less

Submitted 7 November, 2020; originally announced November 2020.

Comments: Accepted at EVALITA 2020: Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop

ACM Class: I.2.7

arXiv:2011.03755 [pdf, other]

NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for Lexical Semantic Change in Diachronic Italian Corpora

Authors: Jason Angel, Carlos A. Rodriguez-Diaz, Alexander Gelbukh, Sergio Jimenez

Abstract: We present our systems and findings on unsupervised lexical semantic change for the Italian language in the DIACR-Ita shared-task at EVALITA 2020. The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets. We propose two models representing the target words across the periods to predict the changing words using thresh… ▽ More We present our systems and findings on unsupervised lexical semantic change for the Italian language in the DIACR-Ita shared-task at EVALITA 2020. The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets. We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes. Our first model solely relies on part-of-speech usage and an ensemble of distance measures. The second model uses word embedding representation to extract the neighbor's relative distances across spaces and propose "the average of absolute differences" to estimate lexical semantic change. Our models achieved competent results, ranking third in the DIACR-Ita competition. Furthermore, we experiment with the k_neighbor parameter of our second model to compare the impact of using "the average of absolute differences" versus the cosine distance used in Hamilton et al. (2016). △ Less

Submitted 7 November, 2020; originally announced November 2020.

Comments: Accepted at EVALITA 2020: Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop

ACM Class: I.2.7

arXiv:2009.03397 [pdf, other]

NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier

Authors: Jason Angel, Segun Taofeek Aroyehun, Antonio Tamayo, Alexander Gelbukh

Abstract: Code-switching is a phenomenon in which two or more languages are used in the same message. Nowadays, it is quite common to find messages with languages mixed in social media. This phenomenon presents a challenge for sentiment analysis. In this paper, we use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages. Our simple appr… ▽ More Code-switching is a phenomenon in which two or more languages are used in the same message. Nowadays, it is quite common to find messages with languages mixed in social media. This phenomenon presents a challenge for sentiment analysis. In this paper, we use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages. Our simple approach achieved a F1-score of 0.71 on test set on the competition. We analyze our best model capabilities and perform error analysis to expose important difficulties for classifying sentiment in a code-switching setting. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: Accepted at SemEval-2020, COLING

ACM Class: I.2.7

arXiv:1804.03342 [pdf, other]

Toward Formalizing Teleportation of Pedagogical Artificial Agents

Authors: John Angel, Naveen Sundar Govindarajulu, Selmer Bringsjord

Abstract: Our paradigm for the use of artificial agents to teach requires among other things that they persist through time in their interaction with human students, in such a way that they "teleport" or "migrate" from an embodiment at one time t to a different embodiment at later time t'. In this short paper, we report on initial steps toward the formalization of such teleportation, in order to enable an o… ▽ More Our paradigm for the use of artificial agents to teach requires among other things that they persist through time in their interaction with human students, in such a way that they "teleport" or "migrate" from an embodiment at one time t to a different embodiment at later time t'. In this short paper, we report on initial steps toward the formalization of such teleportation, in order to enable an overseeing AI system to establish, mechanically, and verifiably, that the human students in question will likely believe that the very same artificial agent has persisted across such times despite the different embodiments. △ Less

Submitted 10 April, 2018; originally announced April 2018.

Showing 1–10 of 10 results for author: Angel, J