-
Sequence Graph Network for Online Debate Analysis
Authors:
Quan Mai,
Susan Gauch,
Douglas Adams,
Miaoqing Huang
Abstract:
Online debates involve a dynamic exchange of ideas over time, where participants need to actively consider their opponents' arguments, respond with counterarguments, reinforce their own points, and introduce more compelling arguments as the discussion unfolds. Modeling such a complex process is not a simple task, as it necessitates the incorporation of both sequential characteristics and the capab…
▽ More
Online debates involve a dynamic exchange of ideas over time, where participants need to actively consider their opponents' arguments, respond with counterarguments, reinforce their own points, and introduce more compelling arguments as the discussion unfolds. Modeling such a complex process is not a simple task, as it necessitates the incorporation of both sequential characteristics and the capability to capture interactions effectively. To address this challenge, we employ a sequence-graph approach. Building the conversation as a graph allows us to effectively model interactions between participants through directed edges. Simultaneously, the propagation of information along these edges in a sequential manner enables us to capture a more comprehensive representation of context. We also introduce a Sequence Graph Attention layer to illustrate the proposed information update scheme. The experimental results show that sequence graph networks achieve superior results to existing methods in online debates.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
SetBERT: Enhancing Retrieval Performance for Boolean Logic and Set Operation Queries
Authors:
Quan Mai,
Susan Gauch,
Douglas Adams
Abstract:
We introduce SetBERT, a fine-tuned BERT-based model designed to enhance query embeddings for set operations and Boolean logic queries, such as Intersection (AND), Difference (NOT), and Union (OR). SetBERT significantly improves retrieval performance for logic-structured queries, an area where both traditional and neural retrieval methods typically underperform. We propose an innovative use of inve…
▽ More
We introduce SetBERT, a fine-tuned BERT-based model designed to enhance query embeddings for set operations and Boolean logic queries, such as Intersection (AND), Difference (NOT), and Union (OR). SetBERT significantly improves retrieval performance for logic-structured queries, an area where both traditional and neural retrieval methods typically underperform. We propose an innovative use of inversed-contrastive loss, focusing on identifying the negative sentence, and fine-tuning BERT with a dataset generated via prompt GPT. Furthermore, we demonstrate that, unlike other BERT-based models, fine-tuning with triplet loss actually degrades performance for this specific task. Our experiments reveal that SetBERT-base not only significantly outperforms BERT-base (up to a 63% improvement in Recall) but also achieves performance comparable to the much larger BERT-large model, despite being only one-third the size.
△ Less
Submitted 26 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
Authors:
Thanh-Dat Truong,
Utsav Prabhu,
Dongyi Wang,
Bhiksha Raj,
Susan Gauch,
Jeyamkondan Subbiah,
Khoa Luu
Abstract:
Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different…
▽ More
Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Improving Minority Stress Detection with Emotions
Authors:
Jonathan Ivey,
Susan Gauch
Abstract:
Psychological stress detection is an important task for mental healthcare research, but there has been little prior work investigating the effectiveness of psychological stress models on minority individuals, who are especially vulnerable to poor mental health outcomes. In this work, we use the related task of minority stress detection to evaluate the ability of psychological stress models to unde…
▽ More
Psychological stress detection is an important task for mental healthcare research, but there has been little prior work investigating the effectiveness of psychological stress models on minority individuals, who are especially vulnerable to poor mental health outcomes. In this work, we use the related task of minority stress detection to evaluate the ability of psychological stress models to understand the language of sexual and gender minorities. We find that traditional psychological stress models underperform on minority stress detection, and we propose using emotion-infused models to reduce that performance disparity. We further demonstrate that multi-task psychological stress models outperform the current state-of-the-art for minority stress detection without directly training on minority stress data. We provide explanatory analysis showing that minority communities have different distributions of emotions than the general population and that emotion-infused models improve the performance of stress models on underrepresented groups because of their effectiveness in low-data environments, and we propose that integrating emotions may benefit underrepresented groups in other mental health detection tasks.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Improving Cross-Domain Hate Speech Generalizability with Emotion Knowledge
Authors:
Shi Yin Hong,
Susan Gauch
Abstract:
Reliable automatic hate speech (HS) detection systems must adapt to the in-flow of diverse new data to curtail hate speech. However, hate speech detection systems commonly lack generalizability in identifying hate speech dissimilar to data used in training, impeding their robustness in real-world deployments. In this work, we propose a hate speech generalization framework that leverages emotion kn…
▽ More
Reliable automatic hate speech (HS) detection systems must adapt to the in-flow of diverse new data to curtail hate speech. However, hate speech detection systems commonly lack generalizability in identifying hate speech dissimilar to data used in training, impeding their robustness in real-world deployments. In this work, we propose a hate speech generalization framework that leverages emotion knowledge in a multitask architecture to improve the generalizability of hate speech detection in a cross-domain setting. We investigate emotion corpora with varying emotion categorical scopes to determine the best corpus scope for supplying emotion knowledge to foster generalized hate speech detection. We further assess the relationship between using pretrained Transformers models adapted for hate speech and its effect on our emotion-enriched hate speech generalization model. We perform extensive experiments on six publicly available datasets sourced from different online domains and show that our emotion-enriched HS detection generalization method demonstrates consistent generalization improvement in cross-domain evaluation, increasing generalization performance up to 18.1% and average cross-domain performance up to 8.5%, according to the F1 measure.
△ Less
Submitted 17 December, 2023; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Multidimensional Fairness in Paper Recommendation
Authors:
Reem Alsaffar,
Susan Gauch,
Hiba Al-Kawaz
Abstract:
To prevent potential bias in the paper review and selection process for conferences and journals, most include double blind review. Despite this, studies show that bias still exists. Recommendation algorithms for paper review also may have implicit bias. We offer three fair methods that specifically take into account author diversity in paper recommendation to address this. Our methods provide fai…
▽ More
To prevent potential bias in the paper review and selection process for conferences and journals, most include double blind review. Despite this, studies show that bias still exists. Recommendation algorithms for paper review also may have implicit bias. We offer three fair methods that specifically take into account author diversity in paper recommendation to address this. Our methods provide fair outcomes across many protected variables concurrently, in contrast to typical fair algorithms that only use one protected variable. Five demographic characteristics-gender, ethnicity, career stage, university rank, and geolocation-are included in our multidimensional author profiles. The Overall Diversity approach uses a score for overall diversity to rank publications. The Round Robin Diversity technique chooses papers from authors who are members of each protected group in turn, whereas the Multifaceted Diversity method chooses papers that initially fill the demographic feature with the highest importance. We compare the effectiveness of author diversity profiles based on Boolean and continuous-valued features. By selecting papers from a pool of SIGCHI 2017, DIS 2017, and IUI 2017 papers, we recommend papers for SIGCHI 2017 and evaluate these algorithms using the user profiles. We contrast the papers that were recommended with those that were selected by the conference. We find that utilizing profiles with either Boolean or continuous feature values, all three techniques boost diversity while just slightly decreasing utility or not decreasing. By choosing authors who are 42.50% more diverse and with a 2.45% boost in utility, our best technique, Multifaceted Diversity, suggests a set of papers that match demographic parity. The selection of grant proposals, conference papers, journal articles, and other academic duties might all use this strategy.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Micron-BERT: BERT-based Facial Micro-Expression Recognition
Authors:
Xuan-Bac Nguyen,
Chi Nhan Duong,
Xin Li,
Susan Gauch,
Han-Seok Seo,
Khoa Luu
Abstract:
Micro-expression recognition is one of the most challenging topics in affective computing. It aims to recognize tiny facial movements difficult for humans to perceive in a brief period, i.e., 0.25 to 0.5 seconds. Recent advances in pre-training deep Bidirectional Transformers (BERT) have significantly improved self-supervised learning tasks in computer vision. However, the standard BERT in vision…
▽ More
Micro-expression recognition is one of the most challenging topics in affective computing. It aims to recognize tiny facial movements difficult for humans to perceive in a brief period, i.e., 0.25 to 0.5 seconds. Recent advances in pre-training deep Bidirectional Transformers (BERT) have significantly improved self-supervised learning tasks in computer vision. However, the standard BERT in vision problems is designed to learn only from full images or videos, and the architecture cannot accurately detect details of facial micro-expressions. This paper presents Micron-BERT ($μ$-BERT), a novel approach to facial micro-expression recognition. The proposed method can automatically capture these movements in an unsupervised manner based on two key ideas. First, we employ Diagonal Micro-Attention (DMA) to detect tiny differences between two frames. Second, we introduce a new Patch of Interest (PoI) module to localize and highlight micro-expression interest regions and simultaneously reduce noisy backgrounds and distractions. By incorporating these components into an end-to-end deep network, the proposed $μ$-BERT significantly outperforms all previous work in various micro-expression tasks. $μ$-BERT can be trained on a large-scale unlabeled dataset, i.e., up to 8 million images, and achieves high accuracy on new unseen facial micro-expression datasets. Empirical experiments show $μ$-BERT consistently outperforms state-of-the-art performance on four micro-expression benchmarks, including SAMM, CASME II, SMIC, and CASME3, by significant margins. Code will be available at \url{https://github.com/uark-cviu/Micron-BERT}
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
An Automated Method to Enrich Consumer Health Vocabularies Using GloVe Word Embeddings and An Auxiliary Lexical Resource
Authors:
Mohammed Ibrahim,
Susan Gauch,
Omar Salman,
Mohammed Alqahatani
Abstract:
Background: Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several profess…
▽ More
Background: Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa.
Objective: Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain.
Methods: Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies (CHV). Our approach further improves the CHV by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary.
Results: The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with P<.001.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
WOVe: Incorporating Word Order in GloVe Word Embeddings
Authors:
Mohammed Ibrahim,
Susan Gauch,
Tyler Gerth,
Brandon Cox
Abstract:
Word vector representations open up new opportunities to extract useful information from unstructured text. Defining a word as a vector made it easy for the machine learning algorithms to understand a text and extract information from. Word vector representations have been used in many applications such word synonyms, word analogy, syntactic parsing, and many others. GloVe, based on word contexts…
▽ More
Word vector representations open up new opportunities to extract useful information from unstructured text. Defining a word as a vector made it easy for the machine learning algorithms to understand a text and extract information from. Word vector representations have been used in many applications such word synonyms, word analogy, syntactic parsing, and many others. GloVe, based on word contexts and matrix vectorization, is an ef-fective vector-learning algorithm. It improves on previous vector-learning algorithms. However, the GloVe model fails to explicitly consider the order in which words appear within their contexts. In this paper, multiple methods of incorporating word order in GloVe word embeddings are proposed. Experimental results show that our Word Order Vector (WOVe) word embeddings approach outperforms unmodified GloVe on the natural lan-guage tasks of analogy completion and word similarity. WOVe with direct concatenation slightly outperformed GloVe on the word similarity task, increasing average rank by 2%. However, it greatly improved on the GloVe baseline on a word analogy task, achieving an average 36.34% improvement in accuracy.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
Diverse Group Formation Based on Multiple Demographic Features
Authors:
Mohammed Alqahtani,
Susan Gauch,
Omar Salman,
Mohammed Ibrahim,
Reem Al-Saffar
Abstract:
The goal of group formation is to build a team to accomplish a specific task. Algorithms are employed to improve the effectiveness of the team so formed and the efficiency of the group selection process. However, there is concern that team formation algorithms could be biased against minorities due to the algorithms themselves or the data on which they are trained. Hence, it is essential to build…
▽ More
The goal of group formation is to build a team to accomplish a specific task. Algorithms are employed to improve the effectiveness of the team so formed and the efficiency of the group selection process. However, there is concern that team formation algorithms could be biased against minorities due to the algorithms themselves or the data on which they are trained. Hence, it is essential to build fair team formation systems that incorporate demographic information into the process of building the group. Although there has been extensive work on modeling individuals expertise for expert recommendation and or team formation, there has been relatively little prior work on modeling demographics and incorporating demographics into the group formation process.
We propose a novel method to represent experts demographic profiles based on multidimensional demographic features. Moreover, we introduce two diversity ranking algorithms that form a group by considering demographic features along with the minimum required skills. Unlike many ranking algorithms that consider one Boolean demographic feature (e.g., gender or race), our diversity ranking algorithms consider multiple multivalued demographic attributes simultaneously. We evaluate our proposed algorithms using a real dataset based on members of a computer science program committee. The result shows that our algorithms form a program committee that is more diverse with an acceptable loss in utility.
△ Less
Submitted 3 December, 2020; v1 submitted 9 August, 2020;
originally announced August 2020.
-
Enriching Consumer Health Vocabulary Using Enhanced GloVe Word Embedding
Authors:
Mohammed Ibrahim,
Susan Gauch,
Omar Salman,
Mohammed Alqahatani
Abstract:
Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV, or CHV for short), is a collection of medical terms written in plain English. It provides a list of simple, easy, and clear terms that laymen prefer to use rather than an equivalent professional medical term. The National Library of Medicine (NLM) has integrated and mapped the CHV terms to their Unified Medical Language System (UML…
▽ More
Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV, or CHV for short), is a collection of medical terms written in plain English. It provides a list of simple, easy, and clear terms that laymen prefer to use rather than an equivalent professional medical term. The National Library of Medicine (NLM) has integrated and mapped the CHV terms to their Unified Medical Language System (UMLS). These CHV terms mapped to 56000 professional concepts on the UMLS. We found that about 48% of these laymen's terms are still jargon and matched with the professional terms on the UMLS. In this paper, we present an enhanced word embedding technique that generates new CHV terms from a consumer-generated text. We downloaded our corpus from a healthcare social media and evaluated our new method based on iterative feedback to word embedding using ground truth built from the existing CHV terms. Our feedback algorithm outperformed unmodified GLoVe and new CHV terms have been detected.
△ Less
Submitted 13 April, 2020; v1 submitted 31 March, 2020;
originally announced April 2020.