-
Cost-Efficient Subjective Task Annotation and Modeling through Few-Shot Annotator Adaptation
Authors:
Preni Golazizian,
Ali Omrani,
Alireza S. Ziabari,
Morteza Dehghani
Abstract:
In subjective NLP tasks, where a single ground truth does not exist, the inclusion of diverse annotators becomes crucial as their unique perspectives significantly influence the annotations. In realistic scenarios, the annotation budget often becomes the main determinant of the number of perspectives (i.e., annotators) included in the data and subsequent modeling. We introduce a novel framework fo…
▽ More
In subjective NLP tasks, where a single ground truth does not exist, the inclusion of diverse annotators becomes crucial as their unique perspectives significantly influence the annotations. In realistic scenarios, the annotation budget often becomes the main determinant of the number of perspectives (i.e., annotators) included in the data and subsequent modeling. We introduce a novel framework for annotation collection and modeling in subjective tasks that aims to minimize the annotation budget while maximizing the predictive performance for each annotator. Our framework has a two-stage design: first, we rely on a small set of annotators to build a multitask model, and second, we augment the model for a new perspective by strategically annotating a few samples per annotator. To test our framework at scale, we introduce and release a unique dataset, Moral Foundations Subjective Corpus, of 2000 Reddit posts annotated by 24 annotators for moral sentiment. We demonstrate that our framework surpasses the previous SOTA in capturing the annotators' individual perspectives with as little as 25% of the original annotation budget on two datasets. Furthermore, our framework results in more equitable models, reducing the performance disparity among annotators.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning
Authors:
Ali Omrani,
Alireza S. Ziabari,
Preni Golazizian,
Jeffrey Sorensen,
Morteza Dehghani
Abstract:
Detecting problematic content, such as hate speech, is a multifaceted and ever-changing task, influenced by social dynamics, user populations, diversity of sources, and evolving language. There has been significant efforts, both in academia and in industry, to develop annotated resources that capture various aspects of problematic content. Due to researchers' diverse objectives, the annotations ar…
▽ More
Detecting problematic content, such as hate speech, is a multifaceted and ever-changing task, influenced by social dynamics, user populations, diversity of sources, and evolving language. There has been significant efforts, both in academia and in industry, to develop annotated resources that capture various aspects of problematic content. Due to researchers' diverse objectives, the annotations are inconsistent and hence, reports of progress on detection of problematic content are fragmented. This pattern is expected to persist unless we consolidate resources considering the dynamic nature of the problem. We propose integrating the available resources, and leveraging their dynamic nature to break this pattern. In this paper, we introduce a continual learning benchmark and framework for problematic content detection comprising over 84 related tasks encompassing 15 annotation schemas from 8 sources. Our benchmark creates a novel measure of progress: prioritizing the adaptability of classifiers to evolving tasks over excelling in specific tasks. To ensure the continuous relevance of our framework, we designed it so that new tasks can easily be integrated into the benchmark. Our baseline results demonstrate the potential of continual learning in capturing the evolving content and adapting to novel manifestations of problematic content.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
High Dynamic Range Imaging via Visual Attention Modules
Authors:
Ali Reza Omrani,
Davide Moroni
Abstract:
Thanks to High Dynamic Range (HDR) imaging methods, the scope of photography has seen profound changes recently. To be more specific, such methods try to reconstruct the lost luminosity of the real world caused by the limitation of regular cameras from the Low Dynamic Range (LDR) images. Additionally, although the State-Of-The-Art methods in this topic perform well, they mainly concentrate on comb…
▽ More
Thanks to High Dynamic Range (HDR) imaging methods, the scope of photography has seen profound changes recently. To be more specific, such methods try to reconstruct the lost luminosity of the real world caused by the limitation of regular cameras from the Low Dynamic Range (LDR) images. Additionally, although the State-Of-The-Art methods in this topic perform well, they mainly concentrate on combining different exposures and have less attention to extracting the informative parts of the images. Thus, this paper aims to introduce a new model capable of incorporating information from the most visible areas of each image extracted by a visual attention module (VAM), which is a result of a segmentation strategy. In particular, the model, based on a deep learning architecture, utilizes the extracted areas to produce the final HDR image. The results demonstrate that our method outperformed most of the State-Of-The-Art algorithms.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Playing with Data: An Augmented Reality Approach to Interact with Visualizations of Industrial Process Tomography
Authors:
Yuchong Zhang,
Yueming Xuan,
Rahul Yadav,
Adel Omrani,
Morten Fjeld
Abstract:
Industrial process tomography (IPT) is a specialized imaging technique widely used in industrial scenarios for process supervision and control. Today, augmented/mixed reality (AR/MR) is increasingly being adopted in many industrial occasions, even though there is still an obvious gap when it comes to IPT. To bridge this gap, we propose the first systematic AR approach using optical see-through (OS…
▽ More
Industrial process tomography (IPT) is a specialized imaging technique widely used in industrial scenarios for process supervision and control. Today, augmented/mixed reality (AR/MR) is increasingly being adopted in many industrial occasions, even though there is still an obvious gap when it comes to IPT. To bridge this gap, we propose the first systematic AR approach using optical see-through (OST) head mounted displays (HMDs) with comparative evaluation for domain users towards IPT visualization analysis. The proof-of-concept was demonstrated by a within-subject user study (n=20) with counterbalancing design. Both qualitative and quantitative measurements were investigated. The results showed that our AR approach outperformed conventional settings for IPT data visualization analysis in bringing higher understandability, reduced task completion time, lower error rates for domain tasks, increased usability with enhanced user experience, and a better recommendation level. We summarize the findings and suggest future research directions for benefiting IPT users with AR/MR.
△ Less
Submitted 10 March, 2024; v1 submitted 3 February, 2023;
originally announced February 2023.
-
Supervised Image Segmentation for High Dynamic Range Imaging
Authors:
Ali Reza Omrani,
Davide Moroni
Abstract:
Regular cameras and cell phones are able to capture limited luminosity. Thus, in terms of quality, most of the produced images from such devices are not similar to the real world. They are overly dark or too bright, and the details are not perfectly visible. Various methods, which fall under the name of High Dynamic Range (HDR) Imaging, can be utilised to cope with this problem. Their objective is…
▽ More
Regular cameras and cell phones are able to capture limited luminosity. Thus, in terms of quality, most of the produced images from such devices are not similar to the real world. They are overly dark or too bright, and the details are not perfectly visible. Various methods, which fall under the name of High Dynamic Range (HDR) Imaging, can be utilised to cope with this problem. Their objective is to produce an image with more details. However, unfortunately, most methods for generating an HDR image from Multi-Exposure images only concentrate on how to combine different exposures and do not have any focus on choosing the best details of each image. Therefore, it is strived in this research to extract the most visible areas of each image with the help of image segmentation. Two methods of producing the Ground Truth were considered, as manual threshold and Otsu threshold, and a neural network will be used to train segment these areas. Finally, it will be shown that the neural network is able to segment the visible parts of pictures acceptably.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Social-Group-Agnostic Word Embedding Debiasing via the Stereotype Content Model
Authors:
Ali Omrani,
Brendan Kennedy,
Mohammad Atari,
Morteza Dehghani
Abstract:
Existing word embedding debiasing methods require social-group-specific word pairs (e.g., "man"-"woman") for each social attribute (e.g., gender), which cannot be used to mitigate bias for other social groups, making these methods impractical or costly to incorporate understudied social groups in debiasing. We propose that the Stereotype Content Model (SCM), a theoretical framework developed in so…
▽ More
Existing word embedding debiasing methods require social-group-specific word pairs (e.g., "man"-"woman") for each social attribute (e.g., gender), which cannot be used to mitigate bias for other social groups, making these methods impractical or costly to incorporate understudied social groups in debiasing. We propose that the Stereotype Content Model (SCM), a theoretical framework developed in social psychology for understanding the content of stereotypes, which structures stereotype content along two psychological dimensions - "warmth" and "competence" - can help debiasing efforts to become social-group-agnostic by capturing the underlying connection between bias and stereotypes. Using only pairs of terms for warmth (e.g., "genuine"-"fake") and competence (e.g.,"smart"-"stupid"), we perform debiasing with established methods and find that, across gender, race, and age, SCM-based debiasing performs comparably to group-specific debiasing
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
The Moral Foundations Reddit Corpus
Authors:
Jackson Trager,
Alireza S. Ziabari,
Aida Mostafazadeh Davani,
Preni Golazizian,
Farzan Karimi-Malekabadi,
Ali Omrani,
Zhihe Li,
Brendan Kennedy,
Nils Karl Reimer,
Melissa Reyes,
Kelsey Cheng,
Mellow Wei,
Christina Merrifield,
Arta Khosravi,
Evans Alvarez,
Morteza Dehghani
Abstract:
Moral framing and sentiment can affect a variety of online and offline behaviors, including donation, pro-environmental action, political engagement, and even participation in violent protests. Various computational methods in Natural Language Processing (NLP) have been used to detect moral sentiment from textual data, but in order to achieve better performances in such subjective tasks, large set…
▽ More
Moral framing and sentiment can affect a variety of online and offline behaviors, including donation, pro-environmental action, political engagement, and even participation in violent protests. Various computational methods in Natural Language Processing (NLP) have been used to detect moral sentiment from textual data, but in order to achieve better performances in such subjective tasks, large sets of hand-annotated training data are needed. Previous corpora annotated for moral sentiment have proven valuable, and have generated new insights both within NLP and across the social sciences, but have been limited to Twitter. To facilitate improving our understanding of the role of moral rhetoric, we present the Moral Foundations Reddit Corpus, a collection of 16,123 Reddit comments that have been curated from 12 distinct subreddits, hand-annotated by at least three trained annotators for 8 categories of moral sentiment (i.e., Care, Proportionality, Equality, Purity, Authority, Loyalty, Thin Morality, Implicit/Explicit Morality) based on the updated Moral Foundations Theory (MFT) framework. We use a range of methodologies to provide baseline moral-sentiment classification results for this new corpus, e.g., cross-domain classification and knowledge transfer.
△ Less
Submitted 17 August, 2022; v1 submitted 10 August, 2022;
originally announced August 2022.
-
Improving Counterfactual Generation for Fair Hate Speech Detection
Authors:
Aida Mostafazadeh Davani,
Ali Omrani,
Brendan Kennedy,
Mohammad Atari,
Xiang Ren,
Morteza Dehghani
Abstract:
Bias mitigation approaches reduce models' dependence on sensitive features of data, such as social group tokens (SGTs), resulting in equal predictions across the sensitive features. In hate speech detection, however, equalizing model predictions may ignore important differences among targeted social groups, as hate speech can contain stereotypical language specific to each SGT. Here, to take the s…
▽ More
Bias mitigation approaches reduce models' dependence on sensitive features of data, such as social group tokens (SGTs), resulting in equal predictions across the sensitive features. In hate speech detection, however, equalizing model predictions may ignore important differences among targeted social groups, as hate speech can contain stereotypical language specific to each SGT. Here, to take the specific language about each SGT into account, we rely on counterfactual fairness and equalize predictions among counterfactuals, generated by changing the SGTs. Our method evaluates the similarity in sentence likelihoods (via pre-trained language models) among counterfactuals, to treat SGTs equally only within interchangeable contexts. By applying logit pairing to equalize outcomes on the restricted set of counterfactuals for each instance, we improve fairness metrics while preserving model performance on hate speech detection.
△ Less
Submitted 3 August, 2021;
originally announced August 2021.
-
Fair Hate Speech Detection through Evaluation of Social Group Counterfactuals
Authors:
Aida Mostafazadeh Davani,
Ali Omrani,
Brendan Kennedy,
Mohammad Atari,
Xiang Ren,
Morteza Dehghani
Abstract:
Approaches for mitigating bias in supervised models are designed to reduce models' dependence on specific sensitive features of the input data, e.g., mentioned social groups. However, in the case of hate speech detection, it is not always desirable to equalize the effects of social groups because of their essential role in distinguishing outgroup-derogatory hate, such that particular types of hate…
▽ More
Approaches for mitigating bias in supervised models are designed to reduce models' dependence on specific sensitive features of the input data, e.g., mentioned social groups. However, in the case of hate speech detection, it is not always desirable to equalize the effects of social groups because of their essential role in distinguishing outgroup-derogatory hate, such that particular types of hateful rhetoric carry the intended meaning only when contextualized around certain social group tokens. Counterfactual token fairness for a mentioned social group evaluates the model's predictions as to whether they are the same for (a) the actual sentence and (b) a counterfactual instance, which is generated by changing the mentioned social group in the sentence. Our approach assures robust model predictions for counterfactuals that imply similar meaning as the actual sentence. To quantify the similarity of a sentence and its counterfactual, we compare their likelihood score calculated by generative language models. By equalizing model behaviors on each sentence and its counterfactuals, we mitigate bias in the proposed model while preserving the overall classification performance.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.