Search | arXiv e-print repository

doi 10.1109/AISP61396.2024.10475308

Mining Influential Spreaders in Complex Networks by an Effective Combination of the Degree and K-Shell

Authors: Shima Esfandiari, Seyed Mostafa Fakhrahmad

Abstract: Graph mining is an important technique that used in many applications such as predicting and understanding behaviors and information dissemination within networks. One crucial aspect of graph mining is the identification and ranking of influential nodes, which has applications in various fields including marketing, social communications, and disease control. However, existing models and methods co… ▽ More Graph mining is an important technique that used in many applications such as predicting and understanding behaviors and information dissemination within networks. One crucial aspect of graph mining is the identification and ranking of influential nodes, which has applications in various fields including marketing, social communications, and disease control. However, existing models and methods come with high computational complexity and may not accurately distinguish and identify influential nodes. This paper develops a method based on the k-shell index and degree centrality of nodes and their neighbors. Comparisons to previous works, such as Degree and Neighborhood information Centrality (DNC) and Neighborhood and Path Information Centrality (NPIC), are conducted. The evaluations, which include the correctness with Kendall's Tau, resolution with monotonicity index, correlation plots, and time complexity, demonstrate its superior results. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 6 page, In 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 1-6. IEEE, 2024

MSC Class: IEEE

arXiv:2404.16198 [pdf]

Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model

Authors: Mojdeh Rahmanian, Seyed Mostafa Fakhrahmad, Seyedeh Zahra Mousavi

Abstract: Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have e… ▽ More Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts. △ Less

Submitted 24 April, 2024; originally announced April 2024.

ACM Class: I.7

arXiv:2401.12993 [pdf]

Estimating the severity of dental and oral problems via sentiment classification over clinical reports

Authors: Sare Mahdavifar, Seyed Mostafa Fakhrahmad, Elham Ansarifard

Abstract: Analyzing authors' sentiments in texts as a technique for identifying text polarity can be practical and useful in various fields, including medicine and dentistry. Currently, due to factors such as patients' limited knowledge about their condition, difficulties in accessing specialist doctors, or fear of illness, particularly in pandemic conditions, there might be a delay between receiving a radi… ▽ More Analyzing authors' sentiments in texts as a technique for identifying text polarity can be practical and useful in various fields, including medicine and dentistry. Currently, due to factors such as patients' limited knowledge about their condition, difficulties in accessing specialist doctors, or fear of illness, particularly in pandemic conditions, there might be a delay between receiving a radiology report and consulting a doctor. In some cases, this delay can pose significant risks to the patient, making timely decision-making crucial. Having an automatic system that can inform patients about the deterioration of their condition by analyzing the text of radiology reports could greatly impact timely decision-making. In this study, a dataset comprising 1,134 cone-beam computed tomography (CBCT) photo reports was collected from the Shiraz University of Medical Sciences. Each case was examined, and an expert labeled a severity level for the patient's condition on each document. After preprocessing all the text data, a deep learning model based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network architecture, known as CNN-LSTM, was developed to detect the severity level of the patient's problem based on sentiment analysis in the radiologist's report. The model's performance was evaluated on two datasets, each with two and four classes, in both imbalanced and balanced scenarios. Finally, to demonstrate the effectiveness of our model, we compared its performance with that of other classification models. The results, along with one-way ANOVA and Tukey's test, indicated that our proposed model (CNN-LSTM) performed the best according to precision, recall, and f-measure criteria. This suggests that it can be a reliable model for estimating the severity of oral and dental diseases, thereby assisting patients. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2309.15742 [pdf, other]

doi 10.1016/j.jss.2024.112083

T5APR: Empowering Automated Program Repair across Languages through Checkpoint Ensemble

Authors: Reza Gharibi, Mohammad Hadi Sadreddini, Seyed Mostafa Fakhrahmad

Abstract: Automated program repair (APR) using deep learning techniques has become an important area of research in recent years, aiming to automatically generate bug-fixing patches that can improve software reliability and maintainability. However, most existing methods either target a single language or require high computational resources to train multilingual models. In this paper, we propose T5APR, a n… ▽ More Automated program repair (APR) using deep learning techniques has become an important area of research in recent years, aiming to automatically generate bug-fixing patches that can improve software reliability and maintainability. However, most existing methods either target a single language or require high computational resources to train multilingual models. In this paper, we propose T5APR, a novel neural program repair approach that provides a unified solution for bug fixing across multiple programming languages. T5APR leverages CodeT5, a powerful pre-trained text-to-text transformer model, and adopts a checkpoint ensemble strategy to improve patch recommendation. We conduct comprehensive evaluations on six well-known benchmarks in four programming languages (Java, Python, C, JavaScript), demonstrating T5APR's competitiveness against state-of-the-art techniques. T5APR correctly fixes 1,985 bugs, including 1,442 bugs that none of the compared techniques has fixed. We further support the effectiveness of our approach by conducting detailed analyses, such as comparing the correct patch ranking among different techniques. The findings of this study demonstrate the potential of T5APR for use in real-world applications and highlight the importance of multilingual approaches in the field of APR. △ Less

Submitted 30 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted to the Journal of Systems and Software

Journal ref: J. Syst. Softw. 214 (2024) 112083

arXiv:1803.01562 [pdf]

Local Distance Metric Learning for Nearest Neighbor Algorithm

Authors: Hossein Rajabzadeh, Mansoor Zolghadri Jahromi, Mohammad Sadegh Zare, Mostafa Fakhrahmad

Abstract: Distance metric learning is a successful way to enhance the performance of the nearest neighbor classifier. In most cases, however, the distribution of data does not obey a regular form and may change in different parts of the feature space. Regarding that, this paper proposes a novel local distance metric learning method, namely Local Mahalanobis Distance Learning (LMDL), in order to enhance the… ▽ More Distance metric learning is a successful way to enhance the performance of the nearest neighbor classifier. In most cases, however, the distribution of data does not obey a regular form and may change in different parts of the feature space. Regarding that, this paper proposes a novel local distance metric learning method, namely Local Mahalanobis Distance Learning (LMDL), in order to enhance the performance of the nearest neighbor classifier. LMDL considers the neighborhood influence and learns multiple distance metrics for a reduced set of input samples. The reduced set is called as prototypes which try to preserve local discriminative information as much as possible. The proposed LMDL can be kernelized very easily, which is significantly desirable in the case of highly nonlinear data. The quality as well as the efficiency of the proposed method assesses through a set of different experiments on various datasets and the obtained results show that LDML as well as the kernelized version is superior to the other related state-of-the-art methods. △ Less

Submitted 15 March, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

Comments: 13 pages

Showing 1–5 of 5 results for author: Fakhrahmad, M