-
Item Recommendation Using User Feedback Data and Item Profile
Authors:
Debashish Roy,
Rajarshi Roy Chowdhury,
Abdullah Bin Nasser,
Afdhal Azmi,
Marzieh Babaeianjelodar
Abstract:
Matrix factorization (MS) is a collaborative filtering (CF) based approach, which is widely used for recommendation systems (RS). In this research work, we deal with the content recommendation problem for users in a content management system (CMS) based on users' feedback data. The CMS is applied for publishing and pushing curated content to the employees of a company or an organization. Here, we…
▽ More
Matrix factorization (MS) is a collaborative filtering (CF) based approach, which is widely used for recommendation systems (RS). In this research work, we deal with the content recommendation problem for users in a content management system (CMS) based on users' feedback data. The CMS is applied for publishing and pushing curated content to the employees of a company or an organization. Here, we have used the user's feedback data and content data to solve the content recommendation problem. We prepare individual user profiles and then generate recommendation results based on different categories, including Direct Interaction, Social Share, and Reading Statistics, of user's feedback data. Subsequently, we analyze the effect of the different categories on the recommendation results. The results have shown that different categories of feedback data have different impacts on recommendation accuracy. The best performance achieves if we include all types of data for the recommendation task. We also incorporate content similarity as a regularization term into an MF model for designing a hybrid model. Experimental results have shown that the proposed hybrid model demonstrates better performance compared with the traditional MF-based models.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Explainable and High-Performance Hate and Offensive Speech Detection
Authors:
Marzieh Babaeianjelodar,
Gurram Poorna Prudhvi,
Stephen Lorenz,
Keyu Chen,
Sumona Mondal,
Soumyabrata Dey,
Navin Kumar
Abstract:
The spread of information through social media platforms can create environments possibly hostile to vulnerable communities and silence certain groups in society. To mitigate such instances, several models have been developed to detect hate and offensive speech. Since detecting hate and offensive speech in social media platforms could incorrectly exclude individuals from social media platforms, wh…
▽ More
The spread of information through social media platforms can create environments possibly hostile to vulnerable communities and silence certain groups in society. To mitigate such instances, several models have been developed to detect hate and offensive speech. Since detecting hate and offensive speech in social media platforms could incorrectly exclude individuals from social media platforms, which can reduce trust, there is a need to create explainable and interpretable models. Thus, we build an explainable and interpretable high performance model based on the XGBoost algorithm, trained on Twitter data. For unbalanced Twitter data, XGboost outperformed the LSTM, AutoGluon, and ULMFiT models on hate speech detection with an F1 score of 0.75 compared to 0.38 and 0.37, and 0.38 respectively. When we down-sampled the data to three separate classes of approximately 5000 tweets, XGBoost performed better than LSTM, AutoGluon, and ULMFiT; with F1 scores for hate speech detection of 0.79 vs 0.69, 0.77, and 0.66 respectively. XGBoost also performed better than LSTM, AutoGluon, and ULMFiT in the down-sampled version for offensive speech detection with F1 score of 0.83 vs 0.88, 0.82, and 0.79 respectively. We use Shapley Additive Explanations (SHAP) on our XGBoost models' outputs to makes it explainable and interpretable compared to LSTM, AutoGluon and ULMFiT that are black-box models.
△ Less
Submitted 24 September, 2023; v1 submitted 26 June, 2022;
originally announced June 2022.
-
How is Va** Framed on Online Knowledge Dissemination Platforms?
Authors:
Keyu Chen,
Yiwen Shi,
Jun Luo,
Joyce Jiang,
Shweta Yadav,
Munmun De Choudhury,
Ashiqur R. KhudaBukhsh,
Marzieh Babaeianjelodar,
Frederick Altice,
Navin Kumar
Abstract:
We analyze 1,888 articles and 1,119,453 va** posts to study how va** is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venue…
▽ More
We analyze 1,888 articles and 1,119,453 va** posts to study how va** is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venues for those looking to transition from smoking to va**. Other platforms (Reddit, wikiHow) are more for va** hobbyists and may not sufficiently dissuade youth va**. Conversely, Wikipedia may exaggerate va** harms, dissuading smokers from transitioning. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to design informational tools to reinforce or mitigate va** (mis)perceptions online.
△ Less
Submitted 22 July, 2022; v1 submitted 17 June, 2022;
originally announced June 2022.
-
Partisan US News Media Representations of Syrian Refugees
Authors:
Keyu Chen,
Marzieh Babaeianjelodar,
Yiwen Shi,
Kamila Janmohamed,
Rupak Sarkar,
Ingmar Weber,
Thomas Davidson,
Munmun De Choudhury,
Jonathan Huang,
Shweta Yadav,
Ashique Khudabukhsh,
Preslav Ivanov Nakov,
Chris Bauch,
Orestis Papakyriakopoulos,
Kaveh Khoshnood,
Navin Kumar
Abstract:
We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media t…
▽ More
We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media tended to represent refugees as child victims, welcome in the US, and right-leaning media cast refugees as Islamic terrorists. We noted similar results with our sentiment and offensive speech scores over time, which detail possibly unfavorable representations of refugees in right-leaning media. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to intervene around refugee representations, and design communications campaigns that improve the way society sees refugees and possibly aid refugee outcomes.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
US News and Social Media Framing around Va**
Authors:
Keyu Chen,
Marzieh Babaeianjelodar,
Yiwen Shi,
Rohan Aanegola,
Lam Yin Cheung,
Preslav Ivanov Nakov,
Shweta Yadav,
Angus Bancroft,
Ashiqur R. KhudaBukhsh,
Munmun De Choudhury,
Frederick L. Altice,
Navin Kumar
Abstract:
In this paper, we investigate how va** is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about va** to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around va** for news and for social media. We detail that news me…
▽ More
In this paper, we investigate how va** is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about va** to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around va** for news and for social media. We detail that news media framing of va** shifted over time in line with emergent regulatory trends, such as; flavored va** bans, with little discussion around va** as a smoking cessation tool. We found that social media discussions were far more varied, with transitions toward va** both as a public health harm and as a smoking cessation tool. Our cloze test, dynamic topic model, and question answering showed similar patterns, where social media, but not news media, characterizes va** as combustible cigarette substitute. We use n-grams to detail that social media data first centered on va** as a smoking cessation tool, and in 2019 moved toward narratives around va** regulation, similar to news media frames. Overall, social media tracks the evolution of va** as a social practice, while news media reflects more risk based concerns. A strength of our work is how the different techniques we have applied validate each other. Stakeholders may utilize our findings to intervene around the framing of va**, and may design communications campaigns that improve the way society sees va**, thus possibly aiding smoking cessation; and reducing youth va**.
△ Less
Submitted 22 July, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Is Machine Learning Speaking my Language? A Critical Look at the NLP-Pipeline Across 8 Human Languages
Authors:
Esma Wali,
Yan Chen,
Christopher Mahoney,
Thomas Middleton,
Marzieh Babaeianjelodar,
Mariama Njie,
Jeanna Neefe Matthews
Abstract:
Natural Language Processing (NLP) is increasingly used as a key ingredient in critical decision-making systems such as resume parsers used in sorting a list of job candidates. NLP systems often ingest large corpora of human text, attempting to learn from past human behavior and decisions in order to produce systems that will make recommendations about our future world. Over 7000 human languages ar…
▽ More
Natural Language Processing (NLP) is increasingly used as a key ingredient in critical decision-making systems such as resume parsers used in sorting a list of job candidates. NLP systems often ingest large corpora of human text, attempting to learn from past human behavior and decisions in order to produce systems that will make recommendations about our future world. Over 7000 human languages are being spoken today and the typical NLP pipeline underrepresents speakers of most of them while amplifying the voices of speakers of other languages. In this paper, a team including speakers of 8 languages - English, Chinese, Urdu, Farsi, Arabic, French, Spanish, and Wolof - takes a critical look at the typical NLP pipeline and how even when a language is technically supported, substantial caveats remain to prevent full participation. Despite huge and admirable investments in multilingual support in many tools and resources, we are still making NLP-guided decisions that systematically and dramatically underrepresent the voices of much of the world.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.