Skip to main content

Showing 1–28 of 28 results for author: Fard, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13840  [pdf, other

    cs.AI cs.CL

    StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation

    Authors: Davit Abrahamyan, Fatemeh H. Fard

    Abstract: Developers spend much time finding information that is relevant to their questions. Stack Overflow has been the leading resource, and with the advent of Large Language Models (LLMs), generative models such as ChatGPT are used frequently. However, there is a catch in using each one separately. Searching for answers is time-consuming and tedious, as shown by the many tools developed by researchers t… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.00215  [pdf, other

    cs.SE

    Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent

    Authors: Jie JW Wu, Fatemeh H. Fard

    Abstract: Large language models (LLMs) have significantly improved their ability to perform tasks in the field of code generation. However, there is still a gap between LLMs being capable coders and being top-tier software engineers. Based on the observation that top-level software engineers often ask clarifying questions to reduce ambiguity in both requirements and coding solutions, we argue that the same… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  3. arXiv:2405.01553  [pdf, ps, other

    cs.SE cs.AI

    Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R

    Authors: Amirreza Esmaeili, Iman Saberi, Fatemeh H. Fard

    Abstract: Recently, Large Langauge Models (LLMs) have gained a lot of attention in the Software Engineering (SE) community. LLMs or their variants pre-trained on code are used for many SE tasks. A main approach for adapting LLMs to the downstream task is to fine-tune the models. However, with having billions-parameters-LLMs, fine-tuning the models is not practical. An alternative approach is using Parameter… ▽ More

    Submitted 15 March, 2024; originally announced May 2024.

  4. arXiv:2402.04421  [pdf, other

    cs.SE cs.AI

    Studying Vulnerable Code Entities in R

    Authors: Zixiao Zhao, Millon Madhur Das, Fatemeh H. Fard

    Abstract: Pre-trained Code Language Models (Code-PLMs) have shown many advancements and achieved state-of-the-art results for many software engineering tasks in the past few years. These models are mainly targeted for popular programming languages such as Java and Python, leaving out many other ones like R. Though R has a wide community of developers and users, there is little known about the applicability… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 5 pages, 3 figures, and 2 tables. to be published in ICPC 2024

  5. arXiv:2401.13802  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Investigating the Efficacy of Large Language Models for Code Clone Detection

    Authors: Mohamad Khajezade, Jie JW Wu, Fatemeh Hendijani Fard, Gema Rodríguez-Pérez, Mohamed Sami Shehata

    Abstract: Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task. GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These… ▽ More

    Submitted 30 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  6. arXiv:2307.08540  [pdf, other

    cs.SE

    Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering

    Authors: Iman Saberi, Fatemeh Fard, Fuxiang Chen

    Abstract: Software Engineering (SE) Pre-trained Language Models (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code clone detection) through the fine-tuning of PLMs. In Natural Language Processing (NLP), an alternative in transferring the knowledge of PLMs is explored through the use of adapter, a co… ▽ More

    Submitted 6 February, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted to EMSE

    MSC Class: 68N30 ACM Class: D.2.0; I.2.5

  7. arXiv:2307.07854  [pdf, other

    cs.SE

    AdvFusion: Multilingual Adapter-based Knowledge Transfer for Code Summarization

    Authors: Iman Saberi, Fatemeh Fard, Fuxiang Chen

    Abstract: Parameter Efficient Fine-Tuning (PEFT) is an alternate choice to full fine-tuning a language model. Though PEFT methods are used in natural language domain widely, there are limited studies on using PEFT for language models that are pre-trained on code and comment datasets (i.e., code-LMs). Previous research has also shown that code summarization, a task that intends to generate natural descriptio… ▽ More

    Submitted 2 February, 2024; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: under submission

    MSC Class: 68N30; 68T35 ACM Class: D.2.0; I.2.5

  8. arXiv:2303.06233  [pdf, other

    cs.SE

    Model-Agnostic Syntactical Information for Pre-Trained Programming Language Models

    Authors: Iman Saberi, Fatemeh H. Fard

    Abstract: Pre-trained Programming Language Models (PPLMs) achieved many recent states of the art results for many code-related software engineering tasks. Though some studies use data flow or propose tree-based models that utilize Abstract Syntax Tree (AST), most PPLMs do not fully utilize the rich syntactical information in source code. Still, the input is considered a sequence of tokens. There are two iss… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: 11 pages, 5 Figures, Has been accepted on ICSE 2023

  9. arXiv:2303.01645  [pdf, other

    cs.SE cs.CL cs.LG

    APIContext2Com: Code Comment Generation by Incorporating Pre-Defined API Documentation

    Authors: Ramin Shahbazi, Fatemeh Fard

    Abstract: Code comments are significantly helpful in comprehending software programs and also aid developers to save a great deal of time in software maintenance. Code comment generation aims to automatically predict comments in natural language given a code snippet. Several works investigate the effect of integrating external knowledge on the quality of generated comments. In this study, we propose a solut… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  10. arXiv:2204.10200  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    An Exploratory Study on Code Attention in BERT

    Authors: Rishab Sharma, Fuxiang Chen, Fatemeh Fard, David Lo

    Abstract: Many recent models in software engineering introduced deep neural models based on the Transformer architecture or use transformer-based Pre-trained Language Models (PLM) trained on code. Although these models achieve the state of the arts results in many downstream tasks such as code summarization and bug detection, they are based on Transformer and PLM, which are mainly studied in the Natural Lan… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted in ICPC 2022

  11. arXiv:2204.09654  [pdf, other

    cs.CL cs.AI cs.SE

    LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition

    Authors: Rishab Sharma, Fuxiang Chen, Fatemeh Fard

    Abstract: Code comment generation is the task of generating a high-level natural language description for a given code method or function. Although researchers have been studying multiple ways to generate code comments automatically, previous work mainly considers representing a code token in its entirety semantics form only (e.g., a language model is used to learn the semantics of a code token), and additi… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted at ICPC 2022

  12. arXiv:2204.09653  [pdf, other

    cs.PL cs.CL cs.SE

    On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages

    Authors: Fuxiang Chen, Fatemeh Fard, David Lo, Timofey Bryksin

    Abstract: A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to fine-tuning monolingual PLMs. Furthermore, some programming languages are inhere… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted in ICPC 2022

  13. arXiv:2204.08653  [pdf, other

    cs.SE cs.CL cs.LG

    On The Cross-Modal Transfer from Natural Language to Code through Adapter Modules

    Authors: Divyam Goel, Ramansh Grover, Fatemeh H. Fard

    Abstract: Pre-trained neural Language Models (PTLM), such as CodeBERT, are recently used in software engineering as models pre-trained on large source code corpora. Their knowledge is transferred to downstream tasks (e.g. code clone detection) via fine-tuning. In natural language processing (NLP), other alternatives for transferring the knowledge of PTLMs are explored through using adapters, compact, parame… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 11 pages, 6 figures, ICPC 2022. 30th International Conference on Program Comprehension (ICPC '22), May 16--17, 2022, Virtual Event, USA}

  14. arXiv:2204.07501  [pdf, other

    cs.SE

    Evaluating few shot and Contrastive learning Methods for Code Clone Detection

    Authors: Mohamad Khajezade, Fatemeh Hendijani Fard, Mohamed S. Shehata

    Abstract: Context: Code Clone Detection (CCD) is a software engineering task that is used for plagiarism detection, code search, and code comprehension. Recently, deep learning-based models have achieved an F1 score (a metric used to assess classifiers) of $\sim$95\% on the CodeXGLUE benchmark. These models require many training data, mainly fine-tuned on Java or C++ datasets. However, no previous study eva… ▽ More

    Submitted 9 November, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

  15. On the Effectiveness of Pretrained Models for API Learning

    Authors: Mohammad Abdul Hadi, Imam Nur Bani Yusuf, Ferdian Thung, Kien Gia Luong, Jiang Lingxiao, Fatemeh H. Fard, David Lo

    Abstract: Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 12 pages, 4 figures, ICPC 2022

    Journal ref: 30th International Conference on Program Comprehension (ICPC '22), May 16--17, 2022, Virtual Event, USA}

  16. arXiv:2202.02294  [pdf, other

    cs.CL cs.LG cs.SE

    Pre-Trained Neural Language Models for Automatic Mobile App User Feedback Answer Generation

    Authors: Yue Cao, Fatemeh H. Fard

    Abstract: Studies show that developers' answers to the mobile app users' feedbacks on app stores can increase the apps' star rating. To help app developers generate answers that are related to the users' issues, recent studies develop models to generate the answers automatically. Aims: The app response generation models use deep neural networks and require training data. Pre-Trained neural language Models (… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

    Comments: 6 pages, published in the 2021 ASE RAISE workshop

  17. arXiv:2111.07238  [pdf, other

    cs.SE cs.AI cs.PL

    FACOS: Finding API Relevant Contents on Stack Overflow with Semantic and Syntactic Analysis

    Authors: Kien Luong, Mohammad Hadi, Ferdian Thung, Fatemeh Fard, David Lo

    Abstract: Collecting API examples, usages, and mentions relevant to a specific API method over discussions on venues such as Stack Overflow is not a trivial problem. It requires efforts to correctly recognize whether the discussion refers to the API method that developers/tools are searching for. The content of the thread, which consists of both text paragraphs describing the involvement of the API method i… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

  18. arXiv:2109.09241  [pdf, other

    eess.IV cs.CV cs.LG

    Robust Framework for COVID-19 Identification from a Multicenter Dataset of Chest CT Scans

    Authors: Sadaf Khademi, Shahin Heidarian, Parnian Afshar, Nastaran Enshaei, Farnoosh Naderkhani, Moezedin Javad Rafiee, Anastasia Oikonomou, Akbar Shafiee, Faranak Babaki Fard, Konstantinos N. Plataniotis, Arash Mohammadi

    Abstract: The objective of this study is to develop a robust deep learning-based framework to distinguish COVID-19, Community-Acquired Pneumonia (CAP), and Normal cases based on chest CT scans acquired in different imaging centers using various protocols, and radiation doses. We showed that while our proposed model is trained on a relatively small dataset acquired from only one imaging center using a specif… ▽ More

    Submitted 28 July, 2022; v1 submitted 19 September, 2021; originally announced September 2021.

  19. arXiv:2105.14656  [pdf, other

    eess.IV cs.CV cs.LG

    Human-level COVID-19 Diagnosis from Low-dose CT Scans Using a Two-stage Time-distributed Capsule Network

    Authors: Parnian Afshar, Moezedin Javad Rafiee, Farnoosh Naderkhani, Shahin Heidarian, Nastaran Enshaei, Anastasia Oikonomou, Faranak Babaki Fard, Reut Anconina, Keyvan Farahani, Konstantinos N. Plataniotis, Arash Mohammadi

    Abstract: Reverse transcription-polymerase chain reaction (RT-PCR) is currently the gold standard in COVID-19 diagnosis. It can, however, take days to provide the diagnosis, and false negative rate is relatively high. Imaging, in particular chest computed tomography (CT), can assist with diagnosis and assessment of this disease. Nevertheless, it is shown that standard dose CT scan gives significant radiatio… ▽ More

    Submitted 1 December, 2021; v1 submitted 30 May, 2021; originally announced May 2021.

  20. arXiv:2104.05861  [pdf, other

    cs.SE cs.LG

    Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews

    Authors: Mohammad Abdul Hadi, Fatemeh H. Fard

    Abstract: Context: Mobile app reviews written by users on app stores or social media are significant resources for app developers.Analyzing app reviews have proved to be useful for many areas of software engineering (e.g., requirement engineering, testing). Automatic classification of app reviews requires extensive efforts to manually curate a labeled dataset. When the classification purpose changes (e.g. i… ▽ More

    Submitted 6 April, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: 55 pages, 13 tables, 6 figures, EMSE 2022

  21. arXiv:2103.10668  [pdf, other

    cs.SE cs.CL cs.LG

    API2Com: On the Improvement of Automatically Generated Code Comments Using API Documentations

    Authors: Ramin Shahbazi, Rishab Sharma, Fatemeh H. Fard

    Abstract: Code comments can help in program comprehension and are considered as important artifacts to help developers in software maintenance. However, the comments are mostly missing or are outdated, specially in complex software projects. As a result, several automatic comment generation models are developed as a solution. The recent models explore the integration of external knowledge resources such as… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

  22. arXiv:2103.09340  [pdf, other

    cs.SE

    Technical Debt in the Peer-Review Documentation of R Packages: a rOpenSci Case Study

    Authors: Zadia Codabux, Melina Vidoni, Fatemeh H. Fard

    Abstract: Context: Technical Debt is a metaphor used to describe code that is "not quite right." Although TD studies have gained momentum, TD has yet to be studied as thoroughly in non-Object-Oriented (OO) or scientific software such as R. R is a multi-paradigm programming language, whose popularity in data science and statistical applications has amplified in recent years. Due to R's inherent ability to ex… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

  23. arXiv:2011.06244  [pdf, other

    cs.SE

    A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

    Authors: Steffen Herbold, Alexander Trautsch, Benjamin Ledel, Alireza Aghamohammadi, Taher Ahmed Ghaleb, Kuljit Kaur Chahal, Tim Bossenmaier, Bhaveet Nagaria, Philip Makedonski, Matin Nili Ahmadabadi, Kristof Szabados, Helge Spieker, Matej Madeja, Nathaniel Hoy, Valentina Lenarduzzi, Shangwen Wang, Gema Rodríguez-Pérez, Ricardo Colomo-Palacios, Roberto Verdecchia, Paramvir Singh, Yihao Qin, Debasish Chakroborti, Willard Davis, Vijay Walunj, Hongjun Wu , et al. (23 additional authors not shown)

    Abstract: Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Metho… ▽ More

    Submitted 13 October, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: Status: Accepted at Empirical Software Engineering

  24. arXiv:2010.16041  [pdf, other

    eess.IV cs.CV cs.LG

    COVID-FACT: A Fully-Automated Capsule Network-based Framework for Identification of COVID-19 Cases from Chest CT scans

    Authors: Shahin Heidarian, Parnian Afshar, Nastaran Enshaei, Farnoosh Naderkhani, Anastasia Oikonomou, S. Farokh Atashzar, Faranak Babaki Fard, Kaveh Samimi, Konstantinos N. Plataniotis, Arash Mohammadi, Moezedin Javad Rafiee

    Abstract: The newly discovered Corona virus Disease 2019 (COVID-19) has been globally spreading and causing hundreds of thousands of deaths around the world as of its first emergence in late 2019. Computed tomography (CT) scans have shown distinctive features and higher sensitivity compared to other diagnostic tests, in particular the current gold standard, i.e., the Reverse Transcription Polymerase Chain R… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

  25. arXiv:2009.14623  [pdf, other

    eess.IV cs.CV cs.LG

    COVID-CT-MD: COVID-19 Computed Tomography (CT) Scan Dataset Applicable in Machine Learning and Deep Learning

    Authors: Parnian Afshar, Shahin Heidarian, Nastaran Enshaei, Farnoosh Naderkhani, Moezedin Javad Rafiee, Anastasia Oikonomou, Faranak Babaki Fard, Kaveh Samimi, Konstantinos N. Plataniotis, Arash Mohammadi

    Abstract: Novel Coronavirus (COVID-19) has drastically overwhelmed more than 200 countries affecting millions and claiming almost 1 million lives, since its emergence in late 2019. This highly contagious disease can easily spread, and if not controlled in a timely fashion, can rapidly incapacitate healthcare systems. The current standard diagnosis method, the Reverse Transcription Polymerase Chain Reaction… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

  26. arXiv:2009.09930  [pdf, other

    cs.IR cs.LG stat.ML

    AOBTM: Adaptive Online Biterm Topic Modeling for Version Sensitive Short-texts Analysis

    Authors: Mohammad Abdul Hadi, Fatemeh H Fard

    Abstract: Analysis of mobile app reviews has shown its important role in requirement engineering, software maintenance and evolution of mobile apps. Mobile app developers check their users' reviews frequently to clarify the issues experienced by users or capture the new issues that are introduced due to a recent app update. App reviews have a dynamic nature and their discussed topics change over time. The c… ▽ More

    Submitted 13 September, 2020; originally announced September 2020.

    Comments: 13 pages, 7 figures, 7 tables

  27. ReviewViz: Assisting Developers Perform Empirical Study on Energy Consumption Related Reviews for Mobile Applications

    Authors: Mohammad Abdul Hadi, Fatemeh H Fard

    Abstract: Improving the energy efficiency of mobile applications is a topic that has gained a lot of attention recently. It has been addressed in a number of ways such as identifying energy bugs and develo** a catalog of energy patterns. Previous work shows that users discuss the battery-related issues (energy inefficiency or energy consumption) of the apps in their reviews. However, there is no work that… ▽ More

    Submitted 19 March, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

    Comments: 4 pages, 5 figures

  28. arXiv:2009.05936  [pdf, other

    cs.HC

    Geo-Spatial Data Visualization and Critical Metrics Predictions for Canadian Elections

    Authors: Mohammad Abdul Hadi, Fatemeh H Fard, Irene Vrbik

    Abstract: Open data published by various organizations is intended to make the data available to the public. All over the world, numerous organizations maintain a considerable number of open databases containing a lot of facts and numbers. However, most of them do not offer a concise and insightful data interpretation or visualization tool, which can help users to process all of the information in a consist… ▽ More

    Submitted 13 September, 2020; originally announced September 2020.

    Comments: 7 pages, 11 figures