Showing 1–2 of 2 results for author: Qamar, U

Search v0.5.6 released 2020-02-24

arXiv:2303.07024 [pdf]

cs.CL cs.AI

Addressing Biases in the Texts using an End-to-End Pipeline Approach

Authors: Shaina Raza, Syed Raza Bashir, Sneha, Urooj Qamar

Abstract: The concept of fairness is gaining popularity in academia and industry. Social media is especially vulnerable to media biases and toxic language and comments. We propose a fair ML pipeline that takes a text as input and determines whether it contains biases and toxic content. Then, based on pre-trained word embeddings, it suggests a set of new words by substituting the bi-ased words, the idea is t… ▽ More The concept of fairness is gaining popularity in academia and industry. Social media is especially vulnerable to media biases and toxic language and comments. We propose a fair ML pipeline that takes a text as input and determines whether it contains biases and toxic content. Then, based on pre-trained word embeddings, it suggests a set of new words by substituting the bi-ased words, the idea is to lessen the effects of those biases by replacing them with alternative words. We compare our approach to existing fairness models to determine its effectiveness. The results show that our proposed pipeline can de-tect, identify, and mitigate biases in social media data △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted in Bias @ ECIR 2023
arXiv:2112.08486 [pdf]

cs.IR cs.AI

Text Mining Through Label Induction Grou** Algorithm Based Method

Authors: Gulshan Saleem, Nisar Ahmed, Usman Qamar

Abstract: The main focus of information retrieval methods is to provide accurate and efficient results which are cost-effective too. LINGO (Label Induction Grou** Algorithm) is a clustering algorithm that aims to provide search results in form of quality clusters but also has a few limitations. In this paper, our focus is based on achieving results that are more meaningful and improving the overall perfor… ▽ More The main focus of information retrieval methods is to provide accurate and efficient results which are cost-effective too. LINGO (Label Induction Grou** Algorithm) is a clustering algorithm that aims to provide search results in form of quality clusters but also has a few limitations. In this paper, our focus is based on achieving results that are more meaningful and improving the overall performance of the algorithm. LINGO works on two main steps; Cluster Label Induction by using Latent Semantic Indexing technique (LSI) and Cluster content discovery by using the Vector Space Model (VSM). As LINGO uses VSM in cluster content discovery, our task is to replace VSM with LSI for cluster content discovery and to analyze the feasibility of using LSI with Okapi BM25. The next task is to compare the results of a modified method with the LINGO original method. The research is applied to five different text-based data sets to get more reliable results for every method. Research results show that LINGO produces 40-50% better results when using LSI for content Discovery. From theoretical evidence using Okapi BM25 for scoring method in LSI (LSI+Okapi BM25) for cluster content discovery instead of VSM, also results in better clusters generation in terms of scalability and performance when compares to both VSM and LSI's Results. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: Presented in 5th International. Multidisciplinary Conference, 29-31 Oct., at, ICBS, Lahore

Journal ref: Science International 28.1 (2016)

Search v0.5.6 released 2020-02-24