Skip to main content

Showing 1–24 of 24 results for author: Thung, F

.
  1. BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies

    Authors: Ratnadira Widyasari, Sheng Qin Sim, Camellia Lok, Haodi Qi, Jack Phan, Qi** Tay, Constance Tan, Fiona Wee, Jodie Ethelda Tan, Yuheng Yieh, Brian Goh, Ferdian Thung, Hong ** Kang, Thong Hoang, David Lo, Eng Lieh Ouh

    Abstract: The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the a… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Journal ref: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2020) 1556-1560

  2. arXiv:2310.16390  [pdf, other

    cs.SE

    Evaluating Pre-trained Language Models for Repairing API Misuses

    Authors: Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, David Lo, Asankhaya Sharma, Lingxiao Jiang

    Abstract: API misuses often lead to software bugs, crashes, and vulnerabilities. While several API misuse detectors have been proposed, there are no automatic repair tools specifically designed for this purpose. In a recent study, test-suite-based automatic program repair (APR) tools were found to be ineffective in repairing API misuses. Still, since the study focused on non-learning-aided APR tools, it rem… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Under review by TOSEM

  3. arXiv:2310.11113  [pdf, other

    cs.SE

    Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models

    Authors: Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, David Lo

    Abstract: Software development is an inherently collaborative process, where various stakeholders frequently express their opinions and emotions across diverse platforms. Recognizing the sentiments conveyed in these interactions is crucial for the effective development and ongoing maintenance of software systems. Over the years, many tools have been proposed to aid in sentiment analysis, but accurately iden… ▽ More

    Submitted 19 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Submitted to TOSEM

  4. arXiv:2308.10022  [pdf, other

    cs.SE

    Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection

    Authors: Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, David Lo

    Abstract: Duplicate bug report detection (DBRD) is a long-standing challenge in both academia and industry. Over the past decades, researchers have proposed various approaches to detect duplicate bug reports more accurately. With the recent advancement of deep learning, researchers have also proposed several approaches that leverage deep learning models to detect duplicate bug reports. A recent benchmarking… ▽ More

    Submitted 27 August, 2023; v1 submitted 19 August, 2023; originally announced August 2023.

    Comments: Recently submitted to TOSEM

  5. arXiv:2304.05121  [pdf, other

    cs.SE

    APISENS- Sentiment Scoring Tool for APIs with Crowd-Knowledge

    Authors: Kisub Kim, Ferdian Thung, Ting Zhang, Ivana Clairine Irsan, Ratnadira Widyasari, Zhou Yang, David Lo

    Abstract: Utilizing pre-existing software artifacts, such as libraries and Application Programming Interfaces (APIs), is crucial for software development efficiency. However, the abundance of artifacts that provide similar functionality can lead to confusion among developers, resulting in a challenge for proper selection and implementation. Through our preliminary investigation, we found that utilizing the… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  6. arXiv:2304.02514  [pdf, other

    cs.SE

    APIHarvest: Harvesting API Information from Various Online Sources

    Authors: Ferdian Thung, Kisub Kim, Ting Zhang, Ivana Clairine Irsan, Ratnadira Widyasari, Zhou Yang, David Lo

    Abstract: Using APIs to develop software applications is the norm. APIs help developers to build applications faster as they do not need to reinvent the wheel. It is therefore important for developers to understand the APIs that they plan to use. Developers should also make themselves aware of relevant information updates about APIs. In order to do so, developers need to find and keep track of relevant info… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  7. arXiv:2303.12299  [pdf, other

    cs.SE

    PICASO: Enhancing API Recommendations with Relevant Stack Overflow Posts

    Authors: Ivana Clairine Irsan, Ting Zhang, Ferdian Thung, Kisub Kim, David Lo

    Abstract: While having options could be liberating, too many options could lead to the sub-optimal solution being chosen. This is not an exception in the software engineering domain. Nowadays, API has become imperative in making software developers' life easier. APIs help developers implement a function faster and more efficiently. However, given the large number of open-source libraries to choose from, cho… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted at MSR 2023

  8. arXiv:2303.06853  [pdf, ps, other

    cs.SE

    Representation Learning for Stack Overflow Posts: How Far are We?

    Authors: Junda He, Zhou Xin, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Irsan, David Lo

    Abstract: The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content.The performance of such solutions hinges significantly on the selection of representation model for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights t… ▽ More

    Submitted 9 April, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

  9. arXiv:2303.06286  [pdf, other

    cs.SE

    NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

    Authors: Ratnadira Widyasari, Zhou Yang, Ferdian Thung, Sheng Qin Sim, Fiona Wee, Camellia Lok, Jack Phan, Haodi Qi, Constance Tan, Qi** Tay, David Lo

    Abstract: Machine learning (ML) has gained much attention and been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such a high-quality dataset poses an obstacle in understanding ML projects. To help… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted by MSR 2023

  10. arXiv:2212.00548  [pdf, other

    cs.SE

    Duplicate Bug Report Detection: How Far Are We?

    Authors: Ting Zhang, DongGyun Han, Venkatesh Vinayakarao, Ivana Clairine Irsan, Bowen Xu, Ferdian Thung, David Lo, Lingxiao Jiang

    Abstract: Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: Accepted by ACM Transactions on Software Engineering and Methodology

  11. arXiv:2209.10868  [pdf, other

    cs.SE

    Answer Summarization for Technical Queries: Benchmark and New Approach

    Authors: Yang Chengran, Bowen Xu, Ferdian Thung, Yucen Shi, Ting Zhang, Zhou Yang, Xin Zhou, Jieke Shi, Junda He, DongGyun Han, David Lo

    Abstract: Prior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. There is a need for a benchmark with ground truth summaries to complement assessment through user studies. Unfortunately, such a benchmark is non-existent for ans… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Accepted by ASE 2022

  12. arXiv:2206.11619  [pdf, other

    cs.SE

    AutoPRTitle: A Tool for Automatic Pull Request Title Generation

    Authors: Ivana Clairine Irsan, Ting Zhang, Ferdian Thung, David Lo, Lingxiao Jiang

    Abstract: With the rise of the pull request mechanism in software development, the quality of pull requests has gained more attention. Prior works focus on improving the quality of pull request descriptions and several approaches have been proposed to automatically generate pull request descriptions. As an essential component of a pull request, pull request titles have not received a similar level of attent… ▽ More

    Submitted 5 August, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

    Comments: Accepted by the ICSME'22 Tool Demonstration Track

  13. iTiger: An Automatic Issue Title Generation Tool

    Authors: Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, DongGyun Han, David Lo, Lingxiao Jiang

    Abstract: In both commercial and open-source software, bug reports or issues are used to track bugs or feature requests. However, the quality of issues can differ a lot. Prior research has found that bug reports with good quality tend to gain more attention than the ones with poor quality. As an essential component of an issue, title quality is an important aspect of issue quality. Moreover, issues are usua… ▽ More

    Submitted 31 August, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted by the ESEC/FSE 2022 Demonstrations Track

  14. arXiv:2206.10430  [pdf, other

    cs.SE

    Automatic Pull Request Title Generation

    Authors: Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, DongGyun Han, David Lo, Lingxiao Jiang

    Abstract: Pull Requests (PRs) are a mechanism on modern collaborative coding platforms, such as GitHub. PRs allow developers to tell others that their code changes are available for merging into another branch in a repository. A PR needs to be reviewed and approved by the core team of the repository before the changes are merged into the branch. Usually, reviewers need to identify a PR that is in line with… ▽ More

    Submitted 30 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted by the ICSME'22 research track

  15. On the Effectiveness of Pretrained Models for API Learning

    Authors: Mohammad Abdul Hadi, Imam Nur Bani Yusuf, Ferdian Thung, Kien Gia Luong, Jiang Lingxiao, Fatemeh H. Fard, David Lo

    Abstract: Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 12 pages, 4 figures, ICPC 2022

    Journal ref: 30th International Conference on Program Comprehension (ICPC '22), May 16--17, 2022, Virtual Event, USA}

  16. arXiv:2203.04519  [pdf, other

    cs.SE cs.MM

    Efficient Search of Live-Coding Screencasts from Online Videos

    Authors: Chengran Yang, Ferdian Thung, David Lo

    Abstract: Programming videos on the Internet are valuable resources for learning programming skills. To find relevant videos, developers typically search online video platforms (e.g., YouTube) with keywords on topics they wish to learn. Developers often look for live-coding screencasts, in which the videos' authors perform live coding. Yet, not all programming videos are live-coding screencasts. In this wor… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: accepted by SANER 2022

  17. arXiv:2111.07238  [pdf, other

    cs.SE cs.AI cs.PL

    FACOS: Finding API Relevant Contents on Stack Overflow with Semantic and Syntactic Analysis

    Authors: Kien Luong, Mohammad Hadi, Ferdian Thung, Fatemeh Fard, David Lo

    Abstract: Collecting API examples, usages, and mentions relevant to a specific API method over discussions on venues such as Stack Overflow is not a trivial problem. It requires efforts to correctly recognize whether the discussion refers to the API method that developers/tools are searching for. The content of the thread, which consists of both text paragraphs describing the involvement of the API method i… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

  18. arXiv:2102.01859  [pdf, other

    cs.SE

    BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems

    Authors: Muhammad Hilmi Asyrofi, Zhou Yang, Imam Nur Bani Yusuf, Hong ** Kang, Ferdian Thung, David Lo

    Abstract: Artificial Intelligence (AI) software systems, such as Sentiment Analysis (SA) systems, typically learn from large amounts of data that may reflect human biases. Consequently, the machine learning model in such software systems may exhibit unintended demographic bias based on specific characteristics (e.g., gender, occupation, country-of-origin, etc.). Such biases manifest in an SA system when it… ▽ More

    Submitted 4 October, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

  19. arXiv:2012.07259  [pdf, other

    cs.SE

    AndroEvolve: Automated Update for Android Deprecated-API Usages

    Authors: Stefanus Agus Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong ** Kang, Lucas Serrano, Gilles Muller

    Abstract: Android operating system (OS) is often updated, where each new version may involve API deprecation. Usages of deprecated APIs in Android apps need to be updated to ensure the apps' compatibility with the old and new versions of Android OS. In this work, we propose AndroEvolve, an automated tool to update usages of deprecated Android APIs, that addresses the limitations of the state-of-the-art tool… ▽ More

    Submitted 11 February, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

  20. arXiv:2011.05020  [pdf, other

    cs.SE

    AndroEvolve: Automated Android API Update with Data Flow Analysis and Variable Denormalization

    Authors: Stefanus A. Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong ** Kang, Lucas Serrano, Gilles Muller

    Abstract: The Android operating system is frequently updated, with each version bringing a new set of APIs. New versions may involve API deprecation; Android apps using deprecated APIs need to be updated to ensure the apps' compatibility withold and new versions of Android. Updating deprecated APIs is a time-consuming endeavor. Hence, automating the updates of Android APIs can be beneficial for developers.… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  21. arXiv:2011.04962  [pdf, other

    cs.SE

    Characterization and Automatic Update of Deprecated Machine-Learning API Usages

    Authors: Stefanus Agus Haryono, Ferdian Thung, David Lo, Julia Lawall, Lingxiao Jiang

    Abstract: Due to the rise of AI applications, machine learning libraries have become far more accessible, with Python being the most common programming language to write them. Machine learning libraries tend to be updated periodically, which may deprecate existing APIs, making it necessary for developers to update their usages. However, updating usages of deprecated APIs are typically not a priority for dev… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  22. arXiv:2005.13220  [pdf, other

    cs.SE

    Automatic Android Deprecated-API Usage Update by Learning from Single Updated Example

    Authors: Stefanus Agus Haryono, Ferdian Thung, Hong ** Kang, Lucas Serrano, Gilles Muller, Julia Lawall, David Lo, Lingxiao Jiang

    Abstract: Due to the deprecation of APIs in the Android operating system,developers have to update usages of the APIs to ensure that their applications work for both the past and current versions of Android.Such updates may be widespread, non-trivial, and time-consuming. Therefore, automation of such updates will be of great benefit to developers. AppEvolve, which is the state-of-the-art tool for automating… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 5 pages, 8 figures. Accepted in The International Conference on Program Comprehension (ICPC) 2020, ERA Track

    ACM Class: I.2.2

  23. arXiv:1802.06997  [pdf, other

    cs.SE

    Categorizing the Content of GitHub README Files

    Authors: Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu, David Lo

    Abstract: README files play an essential role in sha** a developer's first impression of a software repository and in documenting the software project that the repository hosts. Yet, we lack a systematic understanding of the content of a typical README file as well as tools that can process these files automatically. To close this gap, we conduct a qualitative study involving the manual annotation of 4,22… ▽ More

    Submitted 30 July, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

  24. WebAPIRec: Recommending Web APIs to Software Projects via Personalized Ranking

    Authors: Ferdian Thung, Richard J. Oentaryo, David Lo, Yuan Tian

    Abstract: Application programming interfaces (APIs) offer a plethora of functionalities for developers to reuse without reinventing the wheel. Identifying the appropriate APIs given a project requirement is critical for the success of a project, as many functionalities can be reused to achieve faster development. However, the massive number of APIs would often hinder the developers' ability to quickly find… ▽ More

    Submitted 1 May, 2017; originally announced May 2017.

    Comments: IEEE Transactions on Emerging Topics in Computational Intelligence, 2017

    Journal ref: IEEE Transactions on Emerging Topics in Computational Intelligence 2017