Search | arXiv e-print repository

BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering

Authors: Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Dong-Kyu Chae

Abstract: Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehens… ▽ More Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehensive datasets, encoders, NER (named entity recognition) models, POS (part-of-speech) taggers, and lemmatizers, hindering efficient information processing and reasoning applications in the language. Addressing the KG scarcity in Bengali, we propose BanglaAutoKG, a pioneering framework that is able to automatically construct Bengali KGs from any Bangla text. We utilize multilingual LLMs to understand various languages and correlate entities and relations universally. By employing a translation dictionary to identify English equivalents and extracting word features from pre-trained BERT models, we construct the foundational KG. To reduce noise and align word embeddings with our goal, we employ graph-based polynomial filters. Lastly, we implement a GNN-based semantic filter, which elevates contextual understanding and trims unnecessary edges, culminating in the formation of the definitive KG. Empirical findings and case studies demonstrate the universal effectiveness of our model, capable of autonomously constructing semantically enriched KGs from any text. △ Less

Submitted 5 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: 7 pages, 3 figures. Accepted to LREC-COLING 2024. Read in ACL Anthology: https://aclanthology.org/2024.lrec-main.189/

Journal ref: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

arXiv:2403.17210 [pdf, other]

CADGL: Context-Aware Deep Graph Learning for Predicting Drug-Drug Interactions

Authors: Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Serbetar Karlo, Dong-Kyu Chae

Abstract: Examining Drug-Drug Interactions (DDIs) is a pivotal element in the process of drug development. DDIs occur when one drug's properties are affected by the inclusion of other drugs. Detecting favorable DDIs has the potential to pave the way for creating and advancing innovative medications applicable in practical settings. However, existing DDI prediction models continue to face challenges related… ▽ More Examining Drug-Drug Interactions (DDIs) is a pivotal element in the process of drug development. DDIs occur when one drug's properties are affected by the inclusion of other drugs. Detecting favorable DDIs has the potential to pave the way for creating and advancing innovative medications applicable in practical settings. However, existing DDI prediction models continue to face challenges related to generalization in extreme cases, robust feature extraction, and real-life application possibilities. We aim to address these challenges by leveraging the effectiveness of context-aware deep graph learning by introducing a novel framework named CADGL. Based on a customized variational graph autoencoder (VGAE), we capture critical structural and physio-chemical information using two context preprocessors for feature extraction from two different perspectives: local neighborhood and molecular context, in a heterogeneous graphical structure. Our customized VGAE consists of a graph encoder, a latent information encoder, and an MLP decoder. CADGL surpasses other state-of-the-art DDI prediction models, excelling in predicting clinically valuable novel DDIs, supported by rigorous case studies. △ Less

Submitted 27 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: 8 Pages, 4 Figures; In review

arXiv:2403.12984 [pdf, other]

When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings

Authors: Azmine Toushik Wasi, Šerbetar Karlo, Raima Islam, Taki Hasan Rafi, Dong-Kyu Chae

Abstract: Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Esca** from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classifi… ▽ More Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Esca** from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP. △ Less

Submitted 27 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 7 pages, 2 figures, 5 tables, Accepted (invited to present) to the The Second Tiny Papers Track at ICLR 2024 (https://openreview.net/forum?id=VUYCyH8fCw)

Journal ref: The Second Tiny Papers Track at {ICLR} 2024, Tiny Papers @ {ICLR} 2024, Vienna Austria, May 11, 2024

arXiv:2306.08402 [pdf, other]

Fairness and Privacy-Preserving in Federated Learning: A Survey

Authors: Taki Hasan Rafi, Faiza Anan Noor, Tahmid Hussain, Dong-Kyu Chae

Abstract: Federated learning (FL) as distributed machine learning has gained popularity as privacy-aware Machine Learning (ML) systems have emerged as a technique that prevents privacy leakage by building a global model and by conducting individualized training of decentralized edge clients on their own private data. The existing works, however, employ privacy mechanisms such as Secure Multiparty Computing… ▽ More Federated learning (FL) as distributed machine learning has gained popularity as privacy-aware Machine Learning (ML) systems have emerged as a technique that prevents privacy leakage by building a global model and by conducting individualized training of decentralized edge clients on their own private data. The existing works, however, employ privacy mechanisms such as Secure Multiparty Computing (SMC), Differential Privacy (DP), etc. Which are immensely susceptible to interference, massive computational overhead, low accuracy, etc. With the increasingly broad deployment of FL systems, it is challenging to ensure fairness and maintain active client participation in FL systems. Very few works ensure reasonably satisfactory performances for the numerous diverse clients and fail to prevent potential bias against particular demographics in FL systems. The current efforts fail to strike a compromise between privacy, fairness, and model performance in FL systems and are vulnerable to a number of additional problems. In this paper, we provide a comprehensive survey stating the basic concepts of FL, the existing privacy challenges, techniques, and relevant works concerning privacy in FL. We also provide an extensive overview of the increasing fairness challenges, existing fairness notions, and the limited works that attempt both privacy and fairness in FL. By comprehensively describing the existing FL systems, we present the potential future directions pertaining to the challenges of privacy-preserving and fairness-aware FL systems. △ Less

Submitted 14 July, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 23 pages; 2 figures

arXiv:2305.01486 [pdf, other]

ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning

Authors: Azmine Toushik Wasi, Karlo Šerbetar, Raima Islam, Taki Hasan Rafi, Dong-Kyu Chae

Abstract: In this paper, we introduce a framework ARBEx, a novel attentive feature extraction framework driven by Vision Transformer with reliability balancing to cope against poor class distributions, bias, and uncertainty in the facial expression learning (FEL) task. We reinforce several data pre-processing and refinement methods along with a window-based cross-attention ViT to squeeze the best of the dat… ▽ More In this paper, we introduce a framework ARBEx, a novel attentive feature extraction framework driven by Vision Transformer with reliability balancing to cope against poor class distributions, bias, and uncertainty in the facial expression learning (FEL) task. We reinforce several data pre-processing and refinement methods along with a window-based cross-attention ViT to squeeze the best of the data. We also employ learnable anchor points in the embedding space with label distributions and multi-head self-attention mechanism to optimize performance against weak predictions with reliability balancing, which is a strategy that leverages anchor points, attention scores, and confidence values to enhance the resilience of label predictions. To ensure correct label classification and improve the models' discriminative power, we introduce anchor loss, which encourages large margins between anchor points. Additionally, the multi-head self-attention mechanism, which is also trainable, plays an integral role in identifying accurate labels. This approach provides critical elements for improving the reliability of predictions and has a substantial positive effect on final prediction capabilities. Our adaptive model can be integrated with any deep neural network to forestall challenges in various recognition tasks. Our strategy outperforms current state-of-the-art methodologies, according to extensive experiments conducted in a variety of contexts. △ Less

Submitted 14 July, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 12 pages, 7 figures. Code: https://github.com/takihasan/ARBEx

arXiv:2303.14787 [pdf, other]

A Generalized Look at Federated Learning: Survey and Perspectives

Authors: Taki Hasan Rafi, Faiza Anan Noor, Tahmid Hussain, Dong-Kyu Chae, Zhaohui Yang

Abstract: Federated learning (FL) refers to a distributed machine learning framework involving learning from several decentralized edge clients without sharing local dataset. This distributed strategy prevents data leakage and enables on-device training as it updates the global model based on the local model updates. Despite offering several advantages, including data privacy and scalability, FL poses chall… ▽ More Federated learning (FL) refers to a distributed machine learning framework involving learning from several decentralized edge clients without sharing local dataset. This distributed strategy prevents data leakage and enables on-device training as it updates the global model based on the local model updates. Despite offering several advantages, including data privacy and scalability, FL poses challenges such as statistical and system heterogeneity of data in federated networks, communication bottlenecks, privacy and security issues. This survey contains a systematic summarization of previous work, studies, and experiments on FL and presents a list of possibilities for FL across a range of applications and use cases. Other than that, various challenges of implementing FL and promising directions revolving around the corresponding challenges are provided. △ Less

Submitted 26 March, 2023; originally announced March 2023.

Comments: 9 pages, 2 figures

Showing 1–6 of 6 results for author: Rafi, T H