Skip to main content

Showing 1–50 of 64 results for author: Bissyande, T F

.
  1. arXiv:2407.07804  [pdf, other

    cs.SE

    Call Graph Soundness in Android Static Analysis

    Authors: Jordan Samhi, René Just, Tegawendé F. Bissyandé, Michael D. Ernst, Jacques Klein

    Abstract: Static analysis is sound in theory, but an implementation may unsoundly fail to analyze all of a program's code. Any such omission is a serious threat to the validity of the tool's output. Our work is the first to measure the prevalence of these omissions. Previously, researchers and analysts did not know what is missed by static analysis, what sort of code is missed, or the reasons behind these o… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2407.06573  [pdf, other

    cs.SE

    LLM for Mobile: An Initial Roadmap

    Authors: Daihang Chen, Yonghui Liu, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Shuai Wang, Xiao Chen, Tegawendé F. Bissyandé, Jacques Klein, Li Li

    Abstract: When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable nativ… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2407.00225  [pdf, other

    cs.SE

    Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

    Authors: Wendkûuni C. Ouédraogo, Kader Kaboré, Haoye Tian, Yewei Song, Anil Koyuncu, Jacques Klein, David Lo, Tegawendé F. Bissyandé

    Abstract: Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints. Automated test generation techniques have emerged to address this, but often lack readability and require developer intervention. Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation. However… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  4. arXiv:2406.13972  [pdf, other

    cs.SE

    CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

    Authors: Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, Shunfu **

    Abstract: Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially ca… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2404.12636  [pdf, other

    cs.SE

    Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs

    Authors: Boyang Yang, Haoye Tian, Jiadong Ren, Hongyu Zhang, Jacques Klein, Tegawendé F. Bissyandé, Claire Le Goues, Shunfu **

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities on a broad spectrum of downstream tasks. Within the realm of software engineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning to unlock state-of-the-art performance. Fine-tuning approaches proposed in the literature for LLMs on program repair tasks are however general… ▽ More

    Submitted 22 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  6. arXiv:2404.08817  [pdf, other

    cs.CL cs.PL cs.SE

    Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance

    Authors: Yewei Song, Cedric Lothritz, Daniel Tang, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: This paper revisits recent code similarity evaluation metrics, particularly focusing on the application of Abstract Syntax Tree (AST) editing distance in diverse programming languages. In particular, we explore the usefulness of these metrics and compare them to traditional sequence similarity metrics. Our experiments showcase the effectiveness of AST editing distance in capturing intricate code s… ▽ More

    Submitted 3 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: ACL 2024 Main

  7. arXiv:2402.03782  [pdf, other

    cs.CL cs.AI

    Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More

    Authors: Fred Philippy, Siwen Guo, Shohreh Haddadan, Cedric Lothritz, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models (PLMs) to specific tasks by inserting learnable embeddings, or soft prompts, at the input layer of the PLM, without modifying its parameters. This paper investigates the potential of SPT for cross-lingual transfer. Unlike previous studies on SPT for cross-lingual transfer that often fine-tune both the… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted at the 1st Workshop on Modular and Open Multilingual NLP (co-located with EACL 2024)

  8. arXiv:2402.02172  [pdf, other

    cs.SE

    CodeAgent: Collaborative Agents for Software Engineering

    Authors: Daniel Tang, Kisub Kim, Yewei Song, Cedric Lothritz, Bei Li, Saad Ezzini, Haoye Tian, Jacques Klein, Tegawende F. Bissyande

    Abstract: Code review, which aims at ensuring the overall quality and reliability of software, is a cornerstone of software development. Unfortunately, while crucial, Code review is a labor-intensive process that the research community is looking to automate. Existing automated methods rely on single input-output generative models and thus generally struggle to emulate the collaborative nature of code revie… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  9. arXiv:2312.14898  [pdf, other

    cs.SE

    Enriching Automatic Test Case Generation by Extracting Relevant Test Inputs from Bug Reports

    Authors: Wendkûuni C. Ouédraogo, Laura Plein, Kader Kaboré, Andrew Habib, Jacques Klein, David Lo, Tegawendé F. Bissyandé

    Abstract: The quality of a software is highly dependent on the quality of the tests it is submitted to. Writing tests for bug detection is thus essential. However, it is time-consuming when done manually. Automating test cases generation has therefore been an exciting research area in the software engineering community. Most approaches have been focused on generating unit tests. Unfortunately, current effor… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  10. arXiv:2311.01311  [pdf, other

    cs.SE

    Software Engineering for OpenHarmony: A Research Roadmap

    Authors: Li Li, Xiang Gao, Hailong Sun, Chunming Hu, Xiaoyu Sun, Haoyu Wang, Haipeng Cai, Ting Su, Xiapu Luo, Tegawendé F. Bissyandé, Jacques Klein, John Grundy, Tao Xie, Haibo Chen, Huaimin Wang

    Abstract: Mobile software engineering has been a hot research topic for decades. Our fellow researchers have proposed various approaches (with over 7,000 publications for Android alone) in this field that essentially contributed to the great success of the current mobile ecosystem. Existing research efforts mainly focus on popular mobile platforms, namely Android and iOS. OpenHarmony, a newly open-sourced m… ▽ More

    Submitted 21 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

  11. arXiv:2310.12753   

    cs.SE

    Patch-CLIP: A Patch-Text Pre-Trained Model

    Authors: Xunzhu Tang, Zhenghan Chen, Saad Ezzini, Haoye Tian, Jacques Klein, Tegawende F. Bissyande

    Abstract: In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they pred… ▽ More

    Submitted 30 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: The paper is incomplete, causing much confusion for the community

  12. arXiv:2310.07290  [pdf, other

    cs.SE

    Revisiting Android App Categorization

    Authors: Marco Alecci, Jordan Samhi, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Numerous tools rely on automatic categorization of Android apps as part of their methodology. However, incorrect categorization can lead to inaccurate outcomes, such as a malware detector wrongly flagging a benign app as malicious. One such example is the SlideIT Free Keyboard app, which has over 500000 downloads on Google Play. Despite being a "Keyboard" app, it is often wrongly categorized along… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at ICSE2024

  13. arXiv:2310.06320  [pdf, other

    cs.SE

    Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models

    Authors: Laura Plein, Wendkûuni C. Ouédraogo, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: Software testing is a core discipline in software engineering where a large array of research results has been produced, notably in the area of automatic test generation. Because existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications, most testing procedures still rely on test cases written by humans to form test suite… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  14. arXiv:2310.06310  [pdf, other

    cs.SE

    Can LLMs Demystify Bug Reports?

    Authors: Laura Plein, Tegawendé F. Bissyandé

    Abstract: Bugs are notoriously challenging: they slow down software users and result in time-consuming investigations for developers. These challenges are exacerbated when bugs must be reported in natural language by users. Indeed, we lack reliable tools to automatically address reported bugs (i.e., enabling their analysis, reproduction, and bug fixing). With the recent promises created by LLMs such as Chat… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  15. Practical Program Repair via Preference-based Ensemble Strategy

    Authors: Wenkang Zhong, Chuanyi Li, Kui Liu, Tongtong Xu, Tegawendé F. Bissyandé, Jidong Ge, Bin Luo, Vincent Ng

    Abstract: To date, over 40 Automated Program Repair (APR) tools have been designed with varying bug-fixing strategies, which have been demonstrated to have complementary performance in terms of being effective for different bug classes. Intuitively, it should be feasible to improve the overall bug-fixing performance of APR via assembling existing tools. Unfortunately, simply invoking all available APR tools… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: accepted by icse2024 early

  16. arXiv:2308.16586  [pdf, other

    cs.SE

    Learning to Represent Patches

    Authors: Xunzhu Tang, Haoye Tian, Zhenghan Chen, Weiguo Pian, Saad Ezzini, Abdoul Kader Kabore, Andrew Habib, Jacques Klein, Tegawende F. Bissyande

    Abstract: Patch representation is crucial in automating various software engineering tasks, like determining patch accuracy or summarizing code changes. While recent research has employed deep learning for patch representation, focusing on token sequences or Abstract Syntax Trees (ASTs), they often miss the change's semantic intent and the context of modified lines. To bridge this gap, we introduce a novel… ▽ More

    Submitted 3 October, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

  17. arXiv:2308.15234  [pdf, other

    cs.SE

    Hyperbolic Code Retrieval: A Novel Approach for Efficient Code Search Using Hyperbolic Space Embeddings

    Authors: Xunzhu Tang, zhenghan Chen, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, Tegawende F. Bissyande

    Abstract: Within the realm of advanced code retrieval, existing methods have primarily relied on intricate matching and attention-based mechanisms. However, these methods often lead to computational and memory inefficiencies, posing a significant challenge to their real-world applicability. To tackle this challenge, we propose a novel approach, the Hyperbolic Code QA Matching (HyCoQA). This approach leverag… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  18. arXiv:2308.15233  [pdf, other

    cs.SE

    Multilevel Semantic Embedding of Software Patches: A Fine-to-Coarse Grained Approach Towards Security Patch Detection

    Authors: Xunzhu Tang, zhenghan Chen, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, Tegawende F. Bissyande

    Abstract: The growth of open-source software has increased the risk of hidden vulnerabilities that can affect downstream software applications. This concern is further exacerbated by software vendors' practice of silently releasing security patches without explicit warnings or common vulnerability and exposure (CVE) notifications. This lack of transparency leaves users unaware of potential security threats,… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  19. arXiv:2308.12701  [pdf, other

    cs.SE

    How are We Detecting Inconsistent Method Names? An Empirical Study from Code Review Perspective

    Authors: Kisub Kim, Xin Zhou, Dongsun Kim, Julia Lawall, Kui Liu, Tegawendé F. Bissyandé, Jacques Klein, Jaekwon Lee, David Lo

    Abstract: Proper naming of methods can make program code easier to understand, and thus enhance software maintainability. Yet, developers may use inconsistent names due to poor communication or a lack of familiarity with conventions within the software development lifecycle. To address this issue, much research effort has been invested into building automatic tools that can check for method name inconsisten… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  20. arXiv:2308.01413  [pdf, other

    cs.CL cs.AI

    LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning

    Authors: Tiezhu Sun, Weiguo Pian, Nadia Daoudi, Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Transfomer-based models have significantly advanced natural language processing, in particular the performance in text classification tasks. Nevertheless, these models face challenges in processing large files, primarily due to their input constraints, which are generally restricted to hundreds or thousands of tokens. Attempts to address this issue in existing models usually consist in extracting… ▽ More

    Submitted 23 May, 2024; v1 submitted 30 July, 2023; originally announced August 2023.

    Comments: Accepted at NLDB 2024

  21. AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis

    Authors: Jordan Samhi, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static analyzers to achieve scalability and precision when libraries account for a large part of the code. As a direct consequence, it is common practice in the literature to consider developer code only during s… ▽ More

    Submitted 8 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

  22. arXiv:2304.11938  [pdf, other

    cs.SE cs.AI

    Is ChatGPT the Ultimate Programming Assistant -- How far is it?

    Authors: Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: Recently, the ChatGPT LLM has received great attention: it can be used as a bot for discussing source code, prompting it to suggest changes, provide descriptions or even generate code. Typical demonstrations generally focus on existing benchmarks, which may have been used in model training (i.e., data leakage). To assess the feasibility of using an LLM as a useful assistant bot for programmers, we… ▽ More

    Submitted 31 August, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  23. arXiv:2301.03207  [pdf, other

    cs.CR cs.SE

    Negative Results of Fusing Code and Documentation for Learning to Accurately Identify Sensitive Source and Sink Methods An Application to the Android Framework for Data Leak Detection

    Authors: Jordan Samhi, Maria Kober, Abdoul Kader Kabore, Steven Arzt, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Apps on mobile phones manipulate all sorts of data, including sensitive data, leading to privacy-related concerns. Recent regulations like the European GDPR provide rules for the processing of personal and sensitive data, like that no such data may be leaked without the consent of the user. Researchers have proposed sophisticated approaches to track sensitive data within mobile apps, all of whic… ▽ More

    Submitted 11 January, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: 30th IEEE International Conference on Software Analysis, Evolution and Reengineering, RENE track

  24. arXiv:2301.02818  [pdf, other

    cs.SE cs.AI

    App Review Driven Collaborative Bug Finding

    Authors: Xunzhu Tang, Haoye Tian, **fan Kong, Kui Liu, Jacques Klein, Tegawendé F. Bissyande

    Abstract: Software development teams generally welcome any effort to expose bugs in their code base. In this work, we build on the hypothesis that mobile apps from the same category (e.g., two web browser apps) may be affected by similar bugs in their evolution process. It is therefore possible to transfer the experience of one historical app to quickly find bugs in its new counterparts. This has been refer… ▽ More

    Submitted 23 January, 2023; v1 submitted 7 January, 2023; originally announced January 2023.

  25. arXiv:2212.05976  [pdf, other

    cs.SE cs.AI

    DexBERT: Effective, Task-Agnostic and Fine-grained Representation Learning of Android Bytecode

    Authors: Tiezhu Sun, Kevin Allix, Kisub Kim, Xin Zhou, Dongsun Kim, David Lo, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: The automation of a large number of software engineering tasks is becoming possible thanks to Machine Learning (ML). Central to applying ML to software artifacts (like source or executable code) is converting them into forms suitable for learning. Traditionally, researchers have relied on manually selected features, based on expert knowledge which is sometimes imprecise and generally incomplete. R… ▽ More

    Submitted 24 August, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted by IEEE TSE, 2023

  26. arXiv:2212.01635  [pdf, other

    cs.SE cs.AI

    AI-driven Mobile Apps: an Explorative Study

    Authors: Yinghua Li, Xueqi Dang, Haoye Tian, Tiezhu Sun, Zhijie Wang, Lei Ma, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: The integration of artificial intelligence (AI) into mobile applications has significantly transformed various domains, enhancing user experiences and providing personalized services through advanced machine learning (ML) and deep learning (DL) technologies. AI-driven mobile apps typically refer to applications that leverage ML/DL technologies to perform key tasks such as image recognition and nat… ▽ More

    Submitted 8 June, 2024; v1 submitted 3 December, 2022; originally announced December 2022.

  27. arXiv:2211.01752  [pdf

    cs.SE

    A Comparative Study of Smartphone and Smart TV Apps

    Authors: Yonghui Liu, Xiao Chen, Yue Liu, **fan Kong, Tegawendé F. Bissyande, Jacques Klein, Xiaoyu Sun, Chunyang Chen, John Grundy

    Abstract: Context: Smart TVs have become one of the most popular television types. Many app developers and service providers have designed TV versions for their smartphone applications. Despite the extensive studies on mobile app analysis, its TV equivalents receive far too little attention. The relationship between phone and TV has not been the subject of research works. Objective: In this paper, we aim to… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  28. arXiv:2210.10997  [pdf, other

    cs.CR cs.SE

    Demystifying Hidden Sensitive Operations in Android apps

    Authors: Xiaoyu Sun, Xiao Chen, Li Li, Haipeng Cai, John Grundy, Jordan Samhi, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Security of Android devices is now paramount, given their wide adoption among consumers. As researchers develop tools for statically or dynamically detecting suspicious apps, malware writers regularly update their attack mechanisms to hide malicious behavior implementation. This poses two problems to current research techniques: static analysis approaches, given their over-approximations, can repo… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Journal ref: ACM Transactions on Software Engineering and Methodology, 2022

  29. Is this Change the Answer to that Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness

    Authors: Haoye Tian, Xunzhu Tang, Andrew Habib, Shangwen Wang, Kui Liu, Xin Xia, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: In this work, we propose a novel perspective to the problem of patch correctness assessment: a correct patch implements changes that "answer" to a problem posed by buggy behaviour. Concretely, we turn the patch correctness assessment into a Question Answering problem. To tackle this problem, our intuition is that natural language processing can provide the necessary representations and models for… ▽ More

    Submitted 1 September, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

  30. arXiv:2206.06460  [pdf, other

    cs.SE cs.AI

    MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning

    Authors: Weiguo Pian, Hanyu Peng, Xunzhu Tang, Tiezhu Sun, Haoye Tian, Andrew Habib, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: Representation learning of source code is essential for applying machine learning to software engineering tasks. Learning code representation from a multilingual source code dataset has been shown to be more effective than learning from single-language datasets separately, since more training data from multilingual dataset improves the model's ability to extract language-agnostic information from… ▽ More

    Submitted 5 December, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Accepted by AAAI 2023

  31. arXiv:2205.08265  [pdf, other

    cs.CR cs.AI

    A two-steps approach to improve the performance of Android malware detectors

    Authors: Nadia Daoudi, Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: The popularity of Android OS has made it an appealing target to malware developers. To evade detection, including by ML-based techniques, attackers invest in creating malware that closely resemble legitimate apps. In this paper, we propose GUIDED RETRAINING, a supervised representation learning-based method that boosts the performance of a malware detector. First, the dataset is split into "easy"… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

  32. arXiv:2203.08912  [pdf, other

    cs.SE

    The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches

    Authors: Haoye Tian, Kui Liu, Yinghua Li, Abdoul Kader Kaboré, Anil Koyuncu, Andrew Habib, Li Li, Junhao Wen, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: A large body of the literature on automated program repair develops approaches where patches are automatically generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches, although validated by the oracle, may actually be incorrect. Our empirical work investigates different representation learning approaches for code changes to d… ▽ More

    Submitted 12 November, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2008.02944

  33. arXiv:2203.04448  [pdf, other

    cs.CR cs.SE

    TriggerZoo: A Dataset of Android Applications Automatically Infected with Logic Bombs

    Authors: Jordan Samhi, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Many Android apps analyzers rely, among other techniques, on dynamic analysis to monitor their runtime behavior and detect potential security threats. However, malicious developers use subtle, though efficient, techniques to bypass dynamic analyzers. Logic bombs are examples of popular techniques where the malicious code is triggered only under specific circumstances, challenging comprehensive dyn… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: In the proceedings of the 19th International Conference on Mining Software Repositories, Data Showcase, (MSR 2022)

  34. arXiv:2112.10470  [pdf, other

    cs.CR cs.SE

    Difuzer: Uncovering Suspicious Hidden Sensitive Operations in Android Apps

    Authors: Jordan Samhi, Li Li, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: One prominent tactic used to keep malicious behavior from being detected during dynamic test campaigns is logic bombs, where malicious operations are triggered only when specific conditions are satisfied. Defusing logic bombs remains an unsolved problem in the literature. In this work, we propose to investigate Suspicious Hidden Sensitive Operations (SHSOs) as a step towards triaging logic bombs.… ▽ More

    Submitted 23 January, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: In the proceedings of the 44th International Conference on Software Engineering 2022 (ICSE 2022)

  35. arXiv:2112.10469  [pdf, other

    cs.SE

    JuCify: A Step Towards Android Code Unification for Enhanced Static Analysis

    Authors: Jordan Samhi, Jun Gao, Nadia Daoudi, Pierre Graux, Henri Hoyez, Xiaoyu Sun, Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art static analysis approaches have mostly overlooked the presence of such native code, which, however, may implement some key sensitive, or even malicious, parts of the app behavior. This limitation o… ▽ More

    Submitted 23 January, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: In the proceedings of the 44th International Conference on Software Engineering 2022 (ICSE 2022)

  36. arXiv:2112.10123  [pdf, other

    cs.SE cs.CL cs.LG

    Early Detection of Security-Relevant Bug Reports using Machine Learning: How Far Are We?

    Authors: Arthur D. Sawadogo, Quentin Guimard, Tegawendé F. Bissyandé, Abdoul Kader Kaboré, Jacques Klein, Naouel Moha

    Abstract: Bug reports are common artefacts in software development. They serve as the main channel for users to communicate to developers information about the issues that they encounter when using released versions of software programs. In the descriptions of issues, however, a user may, intentionally or not, expose a vulnerability. In a typical maintenance scenario, such security-relevant bug reports are… ▽ More

    Submitted 19 December, 2021; originally announced December 2021.

    Comments: 10 pages

  37. arXiv:2111.07739  [pdf, other

    cs.SE

    Beep: Fine-grained Fix Localization by Learning to Predict Buggy Code Elements

    Authors: Shangwen Wang, Kui Liu, Bo Lin, Li Li, Jacques Klein, Xiaoguang Mao, Tegawendé F. Bissyandé

    Abstract: Software Fault Localization refers to the activity of finding code elements (e.g., statements) that are related to a software failure. The state-of-the-art fault localization techniques, however, produce coarse-grained results that can deter manual debugging or mislead automated repair tools. In this work, we focus specifically on the fine-grained identification of code elements (i.e., tokens) tha… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  38. arXiv:2109.03326  [pdf, ps, other

    cs.CR cs.LG

    DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection based on Image Representation of Bytecode

    Authors: Nadia Daoudi, Jordan Samhi, Abdoul Kader Kabore, Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Computer vision has witnessed several advances in recent years, with unprecedented performance provided by deep representation learning research. Image formats thus appear attractive to other fields such as malware detection, where deep learning on images alleviates the need for comprehensively hand-crafted features generalising to different malware variants. We postulate that this research direct… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

    Comments: This manuscript has been accepted at MLHat 2021, and it will be archived in Springer Communications in Computer and Information Science (CCIS)

  39. arXiv:2107.13296  [pdf, other

    cs.SE cs.AI

    Predicting Patch Correctness Based on the Similarity of Failing Test Cases

    Authors: Haoye Tian, Yinghua Li, Weiguo Pian, Abdoul Kader Kaboré, Kui Liu, Andrew Habib, Jacques Klein, Tegawendé F. Bissyande

    Abstract: Towards predicting patch correctness in APR, we propose a simple, but novel hypothesis on how the link between the patch behaviour and failing test specifications can be drawn: similar failing test cases should require similar patches. We then propose BATS, an unsupervised learning-based system to predict patch correctness by checking patch Behaviour Against failing Test Specification. BATS exploi… ▽ More

    Submitted 16 March, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

  40. arXiv:2012.09916  [pdf, other

    cs.SE

    RAICC: Revealing Atypical Inter-Component Communication in Android Apps

    Authors: Jordan Samhi, Alexandre Bartel, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Inter-Component Communication (ICC) is a key mechanism in Android. It enables developers to compose rich functionalities and explore reuse within and across apps. Unfortunately, as reported by a large body of literature, ICC is rather "complex and largely unconstrained", leaving room to a lack of precision in apps modeling. To address the challenge of tracking ICCs within apps, state of the art st… ▽ More

    Submitted 15 January, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: In the proceedings of the 43rd International Conference on Software Engineering 2021 (ICSE 2021)

  41. IBIR: Bug Report driven Fault Injection

    Authors: Ahmed Khanfir, Anil Koyuncu, Mike Papadakis, Maxime Cordy, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon

    Abstract: Much research on software engineering and software testing relies on experimental studies based on fault injection. Fault injection, however, is not often relevant to emulate real-world software faults since it "blindly" injects large numbers of faults. It remains indeed challenging to inject few but realistic faults that target a particular functionality in a program. In this work, we introduce I… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

  42. arXiv:2011.13280  [pdf, other

    cs.SE

    FlexiRepair: Transparent Program Repair with Generic Patches

    Authors: Anil Koyuncu, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon

    Abstract: Template-based program repair research is in need for a common ground to express fix patterns in a standard and reusable manner. We propose to build on the concept of generic patch (also known as semantic patch), which is widely used in the Linux community to automate code evolution. We advocate that generic patches could provide at the same time a unified representation and a specification for fi… ▽ More

    Submitted 26 November, 2020; originally announced November 2020.

  43. arXiv:2008.02944  [pdf, other

    cs.SE

    Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair

    Authors: Haoye Tian, Kui Liu, Abdoul Kader Kaboreé, Anil Koyuncu, Li Li, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: A large body of the literature of automated program repair develops approaches where patches are generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches, although validated by the oracle, may actually be incorrect. While the state of the art explore research directions that require dynamic information or rely on manually-craf… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  44. arXiv:2008.01676  [pdf, other

    cs.SE

    Anchor: Locating Android Framework-specific Crashing Faults

    Authors: **fan Kong, Li Li, Jun Gao, Timothée Riom, Yanjie Zhao, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Android framework-specific app crashes are hard to debug. Indeed, the callback-based event-driven mechanism of Android challenges crash localization techniques that are developed for traditional Java programs. The key challenge stems from the fact that the buggy code location may not even be listed within the stack trace. For example, our empirical study on 500 framework-specific crashes from an o… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: 12 pages

  45. On the Efficiency of Test Suite based Program Repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

    Authors: Kui Liu, Shangwen Wang, Anil Koyuncu, Kisub Kim, Tegawendé F. Bissyandé, Dongsun Kim, Peng Wu, Jacques Klein, Xiaoguang Mao, Yves Le Traon

    Abstract: Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but affordable, approximation to program specifications. Although the literature regularly sets new records on the number of benchmark bugs that can be fixed, several studies increasingly raise concern… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  46. arXiv:2006.11002  [pdf, other

    cs.SE cs.CY

    A First Look at Android Applications in Google Play related to Covid-19

    Authors: Jordan Samhi, Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Due to the convenience of access-on-demand to information and business solutions, mobile apps have become an important asset in the digital world. In the context of the Covid-19 pandemic, app developers have joined the response effort in various ways by releasing apps that target different user bases (e.g., all citizens or journalists), offer different services (e.g., location tracking or diagnost… ▽ More

    Submitted 15 January, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: Accepted in Empirical Software Engineering under reference: EMSE-D-20-00211R1

  47. arXiv:2006.07087  [pdf, other

    cs.CY physics.soc-ph q-bio.PE

    Data-driven Simulation and Optimization for Covid-19 Exit Strategies

    Authors: Salah Ghamizi, Renaud Rwemalika, Lisa Veiber, Maxime Cordy, Tegawende F. Bissyande, Mike Papadakis, Jacques Klein, Yves Le Traon

    Abstract: The rapid spread of the Coronavirus SARS-2 is a major challenge that led almost all governments worldwide to take drastic measures to respond to the tragedy. Chief among those measures is the massive lockdown of entire countries and cities, which beyond its global economic impact has created some deep social and psychological tensions within populations. While the adopted mitigation measures (incl… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  48. arXiv:2002.02650  [pdf, other

    cs.SE

    What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning

    Authors: Patrick Keller, Laura Plein, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon

    Abstract: Recent successes in training word embeddings for NLP tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax tree… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  49. MadDroid: Characterising and Detecting Devious Ad Content for Android Apps

    Authors: Tianming Liu, Haoyu Wang, Li Li, Xiapu Luo, Feng Dong, Yao Guo, Liu Wang, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Advertisement drives the economy of the mobile app ecosystem. As a key component in the mobile ad business model, mobile ad content has been overlooked by the research community, which poses a number of threats, e.g., propagating malware and undesirable contents. To understand the practice of these devious ad behaviors, we perform a large-scale study on the app contents harvested through automated… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

    Comments: To be published in The Web Conference 2020 (WWW'20)

  50. arXiv:2001.09148  [pdf, other

    cs.SE

    Learning to Catch Security Patches

    Authors: Arthur D. Sawadogo, Tegawendé F. Bissyandé, Naouel Moha, Kevin Allix, Jacques Klein, Li Li, Yves Le Traon

    Abstract: Timely patching is paramount to safeguard users and maintainers against dire consequences of malicious attacks. In practice, patching is prioritized following the nature of the code change that is committed in the code repository. When such a change is labeled as being security-relevant, i.e., as fixing a vulnerability, maintainers rapidly spread the change and users are notified about the need to… ▽ More

    Submitted 24 January, 2020; originally announced January 2020.