Skip to main content

Showing 1–26 of 26 results for author: le, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.17218  [pdf, other

    cs.SE cs.CR cs.LG

    A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection

    Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Earl T. Barr, Wei Le

    Abstract: Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabiliti… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  2. arXiv:2312.01588  [pdf, other

    cs.SE cs.LG

    ActiveClean: Generating Line-Level Vulnerability Data via Active Learning

    Authors: Ashwin Kallingal Joshy, Mirza Sanjida Alam, Shaila Sharmin, Qi Li, Wei Le

    Abstract: Deep learning vulnerability detection tools are increasing in popularity and have been shown to be effective. These tools rely on large volume of high quality training data, which are very hard to get. Most of the currently available datasets provide function-level labels, reporting whether a function is vulnerable or not vulnerable. However, for a vulnerability detection to be useful, we need to… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  3. arXiv:2311.10305  [pdf, other

    eess.IV cs.CV

    Semi-supervised ViT knowledge distillation network with style transfer normalization for colorectal liver metastases survival prediction

    Authors: Mohamed El Amine Elforaici, Emmanuel Montagnon, Francisco Perdigon Romero, William Trung Le, Feryel Azzi, Dominique Trudel, Bich Nguyen, Simon Turcotte, An Tang, Samuel Kadoury

    Abstract: Colorectal liver metastases (CLM) significantly impact colon cancer patients, influencing survival based on systemic chemotherapy response. Traditional methods like tumor grading scores (e.g., tumor regression grade - TRG) for prognosis suffer from subjectivity, time constraints, and expertise demands. Current machine learning approaches often focus on radiological data, yet the relevance of histo… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 16 pages, 7 figures and 7 tables. Submitted to Medical Journal Analysis (MedIA) journal

  4. arXiv:2311.04109  [pdf, other

    cs.LG cs.CR

    Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection

    Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Shaila Sharmin, Wei Le

    Abstract: Recently, pretrained language models have shown state-of-the-art performance on the vulnerability detection task. These models are pretrained on a large corpus of source code, then fine-tuned on a smaller supervised vulnerability dataset. Due to the different training objectives and the performance of the models, it is interesting to consider whether the models have learned the semantics of code r… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  5. arXiv:2310.07958  [pdf, other

    cs.SE cs.CR cs.LG stat.ME

    Towards Causal Deep Learning for Vulnerability Detection

    Authors: Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, Wei Le

    Abstract: Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the mo… ▽ More

    Submitted 14 January, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: ICSE 2024, Camera Ready Version

  6. arXiv:2309.17341  [pdf, ps, other

    cs.LG cs.AI cs.CV

    MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

    Authors: Eliska Kloberdanz, Wei Le

    Abstract: Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference latency, and therefore allows for DNNs to be deployed on platforms with constrained computational resources and real-time systems. However, quantization can lea… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  7. arXiv:2309.11004  [pdf, other

    cs.SE

    Reproducing Failures in Fault Signatures

    Authors: Ashwin Kallingal Joshy, Benjamin Steenhoek, Xiuyuan Guo, Wei Le

    Abstract: Software often fails in the field, however reproducing and debugging field failures is very challenging: the failure-inducing input may be missing, and the program setup can be complicated and hard to reproduce by the developers. In this paper, we propose to generate fault signatures from the failure locations and the original source code to reproduce the faults in small executable programs. We sa… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  8. arXiv:2309.09246  [pdf, other

    cs.CV

    Image-level supervision and self-training for transformer-based cross-modality tumor segmentation

    Authors: Malo de Boisredon, Eugene Vorontsov, William Trung Le, Samuel Kadoury

    Abstract: Deep neural networks are commonly used for automated medical image segmentation, but models will frequently struggle to generalize well across different imaging modalities. This issue is particularly problematic due to the limited availability of annotated data, making it difficult to deploy these models on a larger scale. To overcome these challenges, we propose a new semi-supervised training str… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 17 pages, 10 figures, 5 tables

  9. arXiv:2307.08947  [pdf, other

    cs.SE

    An Effective Data-Driven Approach for Localizing Deep Learning Faults

    Authors: Mohammad Wardat, Breno Dantas Cruz, Wei Le, Hridesh Rajan

    Abstract: Deep Learning (DL) applications are being used to solve problems in critical domains (e.g., autonomous driving or medical diagnosis systems). Thus, developers need to debug their systems to ensure that the expected behavior is delivered. However, it is hard and expensive to debug DNNs. When the failure symptoms or unsatisfied accuracies are reported after training, we lose the traceability as to w… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  10. arXiv:2306.07487  [pdf, other

    cs.SE

    TRACED: Execution-aware Pre-training for Source Code

    Authors: Yangruibo Ding, Ben Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, Baishakhi Ray

    Abstract: Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully exposed before the real execution. Without an understanding of the program execution, statically pre-trained models fail to comprehensively capture the dynamic… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted by ICSE 2024 (Early Cycle). Camera-ready is in preparation

  11. arXiv:2305.15690  [pdf, other

    cs.SE

    Beryllium: Neural Search for Algorithm Implementations

    Authors: Adithya Kulkarni, Mohna Chakraborty, Yonas Sium, Sai Charishma Valluri, Wei Le, Qi Li

    Abstract: In this paper, we explore the feasibility of finding algorithm implementations from code. Successfully matching code and algorithms can help understand unknown code, provide reference implementations, and automatically collect data for learning-based program synthesis. To achieve the goal, we designed a new language named p-language to specify the algorithms and a static analyzer for the p-languag… ▽ More

    Submitted 1 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  12. arXiv:2305.02515  [pdf, other

    cs.SE

    A Study of Static Warning Cascading Tools (Experience Paper)

    Authors: Xiuyuan Guo, Ashwin Kallingal Joshy, Benjamin Steenhoek, Wei Le, Lori Flynn

    Abstract: Static analysis is widely used for software assurance. However, static analysis tools can report an overwhelming number of warnings, many of which are false positives. Applying static analysis to a new version, a large number of warnings can be only relevant to the old version. Inspecting these warnings is a waste of time and can prevent developers from finding the new bugs in the new version. In… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 11 pages ( include references) , 12 Figures

  13. arXiv:2303.03965  [pdf, other

    cs.CV cs.LG

    Comparing 3D deformations between longitudinal daily CBCT acquisitions using CNN for head and neck radiotherapy toxicity prediction

    Authors: William Trung Le, Chulmin Bang, Philippine Cordelle, Daniel Markel, Phuc Felix Nguyen-Tan, Houda Bahig, Samuel Kadoury

    Abstract: Adaptive radiotherapy is a growing field of study in cancer treatment due to it's objective in sparing healthy tissue. The standard of care in several institutions includes longitudinal cone-beam computed tomography (CBCT) acquisitions to monitor changes, but have yet to be used to improve tumor control while managing side-effects. The aim of this study is to demonstrate the clinical value of pre-… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: 11 pages, 3 figures, 2 equations, 2 tables

  14. arXiv:2212.08109  [pdf, other

    cs.SE cs.CR cs.LG

    An Empirical Study of Deep Learning Models for Vulnerability Detection

    Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, Wei Le

    Abstract: Deep learning (DL) models of code have recently reported great progress for vulnerability detection. In some cases, DL-based models have outperformed static analysis tools. Although many great models have been proposed, we do not yet have a good understanding of these models. This limits the further advancement of model robustness, debugging, and deployment for the vulnerability detection. In this… ▽ More

    Submitted 12 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: 12 pages, 14 figures. Accepted at ICSE 2023. Camera-ready version

  15. arXiv:2212.08108  [pdf, other

    cs.SE cs.CR cs.LG

    Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection

    Authors: Benjamin Steenhoek, Hongyang Gao, Wei Le

    Abstract: Deep learning-based vulnerability detection has shown great performance and, in some studies, outperformed static analysis tools. However, the highest-performing approaches use token-based transformer models, which are not the most efficient to capture code semantics required for vulnerability detection. Classical program analysis techniques such as dataflow analysis can detect many types of bugs… ▽ More

    Submitted 1 October, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at ICSE 2024 (Early Cycle). Camera-ready version

  16. FuzzerAid: Grou** Fuzzed Crashes Based On Fault Signatures

    Authors: Ashwin Kallingal Joshy, Wei Le

    Abstract: Fuzzing has been an important approach for finding bugs and vulnerabilities in programs. Many fuzzers deployed in industry run daily and can generate an overwhelming number of crashes. Diagnosing such crashes can be very challenging and time-consuming. Existing fuzzers typically employ heuristics such as code coverage or call stack hashes to weed out duplicate reporting of bugs. While these heuris… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

    Comments: In 37th IEEE/ACM International Conference on Automated Software Engineering, October 10 to 14, 2022, Rochester, MI, USA. 12 pages

    ACM Class: D.2.5

  17. DeepStability: A Study of Unstable Numerical Methods and Their Solutions in Deep Learning

    Authors: E. Kloberdanz, K. G. Kloberdanz, W. Le

    Abstract: Deep learning (DL) has become an integral part of solutions to various important problems, which is why ensuring the quality of DL systems is essential. One of the challenges of achieving reliability and robustness of DL software is to ensure that algorithm implementations are numerically stable. DL algorithms require a large amount and a wide variety of numerical computations. A naive implementat… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

    Comments: to be published in ICSE (2022)

  18. arXiv:2112.04036  [pdf, other

    cs.SE cs.LG

    DeepDiagnosis: Automatically Diagnosing Faults and Recommending Actionable Fixes in Deep Learning Programs

    Authors: Mohammad Wardat, Breno Dantas Cruz, Wei Le, Hridesh Rajan

    Abstract: Deep Neural Networks (DNNs) are used in a wide variety of applications. However, as in any software application, DNN-based apps are afflicted with bugs. Previous work observed that DNN bug fix patterns are different from traditional bug fix patterns. Furthermore, those buggy models are non-trivial to diagnose and fix due to inexplicit errors with several options to fix them. To support developers… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

    Comments: Accepted at ICSE 2022

  19. arXiv:2106.15498  [pdf, other

    cs.LG cs.CL cs.IR

    Classification of Consumer Belief Statements From Social Media

    Authors: Gerhard Johann Hagerer, Wenbin Le, Hannah Danner, Georg Groh

    Abstract: Social media offer plenty of information to perform market research in order to meet the requirements of customers. One way how this research is conducted is that a domain expert gathers and categorizes user-generated content into a complex and fine-grained class structure. In many of such cases, little data meets complex annotations. It is not yet fully understood how this can be leveraged succes… ▽ More

    Submitted 24 July, 2023; v1 submitted 29 June, 2021; originally announced June 2021.

  20. Validating Static Warnings via Testing Code Fragments

    Authors: Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, Wei Le

    Abstract: Static analysis is an important approach for finding bugs and vulnerabilities in software. However, inspecting and confirming static warnings are challenging and time-consuming. In this paper, we present a novel solution that automatically generates test cases based on static warnings to validate true and false positives. We designed a syntactic patching algorithm that can generate syntactically v… ▽ More

    Submitted 28 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis July 11 to 17, 2021, Denmark. 13 pages

    ACM Class: D.2.5; D.2.4; F.3.1

  21. arXiv:2103.03376  [pdf, other

    cs.SE

    DeepLocalize: Fault Localization for Deep Neural Networks

    Authors: Mohammad Wardat, Wei Le, Hridesh Rajan

    Abstract: Deep neural networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques do not support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach that automatically determines whether t… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: Accepted at ICSE 2021

  22. arXiv:2005.14017  [pdf, other

    eess.IV cs.CV

    A Normalized Fully Convolutional Approach to Head and Neck Cancer Outcome Prediction

    Authors: William Le, Francisco Perdigón Romero, Samuel Kadoury

    Abstract: In medical imaging, radiological scans of different modalities serve to enhance different sets of features for clinical diagnosis and treatment planning. This variety enriches the source information that could be used for outcome prediction. Deep learning methods are particularly well-suited for feature extraction from high-dimensional inputs such as images. In this work, we apply a CNN classifica… ▽ More

    Submitted 29 May, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: 6 pages, 1 figure, 1 table, Medical Imaging with Deep Learning 2020 conference

    Report number: MIDL/2020/ExtendedAbstract/JojEzQ3E5n

  23. arXiv:2002.12393  [pdf, other

    cs.DB

    Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

    Authors: Tarique Siddiqui, Alekh **dal, Shi Qiao, Hiren Patel, Wangchao le

    Abstract: Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost model, therefore, is akin to better resource efficiency and lower operational costs. Unfortunately, the production workloads at Microsoft show that costs are very co… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: To appear at SIGMOD 2020

  24. arXiv:1911.09201  [pdf, ps, other

    cs.SE

    Testing Criteria for Mobile Apps Based on Callback Sequences

    Authors: Danilo Dominguez Perez, Wei Le

    Abstract: App quality has been shown to be the most important indicator of app adoption. To assure quality, developers mainly use testing to find bugs in app and apply structural and GUI test coverage criteria. However, mobile apps have more behaviors than the GUI actions, e.g. an app also handles events from sensors and executes long-running background tasks through Android API calls to Services and AsyncT… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

  25. arXiv:1911.07988  [pdf, ps, other

    cs.SE

    Invariant Diffs

    Authors: Ashwin Kallingal Joshy, Wei Le

    Abstract: Software development is inherently incremental. Nowadays, many software companies adopt an agile process and a shorter release cycle, where software needs to be delivered faster with quality assurances. On the other hand, the majority of existing program analysis tools still target single versions of programs and are slow and inflexible to handle changes. In the popular version control systems suc… ▽ More

    Submitted 29 June, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

  26. arXiv:1703.08902  [pdf, ps, other

    cs.SE

    Generating Predicate Callback Summaries for the Android Framework

    Authors: Danilo Dominguez Perez, Wei Le

    Abstract: One of the challenges of analyzing, testing and debugging Android apps is that the potential execution orders of callbacks are missing from the apps' source code. However, bugs, vulnerabilities and refactoring transformations have been found to be related to callback sequences. Existing work on control flow analysis of Android apps have mainly focused on analyzing GUI events. GUI events, although… ▽ More

    Submitted 29 March, 2017; v1 submitted 26 March, 2017; originally announced March 2017.

    Comments: 11 pages